Reference picture lists for 3dv

ABSTRACT

Several implementations relate to construction of reference picture lists for coding of 3DV content. One or more exemplary embodiments of the present invention include the construction of reference picture lists in accordance with an optimized inter-layer dependency structure. In addition, the reference picture lists may be prioritized according to the degree of redundancy the reference pictures have with respect to the picture for which the reference picture list is constructed. In one implementation, a priority is determined for an inter-layer reference for a picture. The priority is determined relative to one or more other references for the picture. The inter-layer reference is included in an ordered list of references for the picture based on the priority.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of the filing date of U.S.Provisional Application Ser. No. 61/215,154 filed on May 1, 2009,entitled “3D Video Coding Formats,” the filing date of U.S. ProvisionalApplication Ser. No. 61/215,874 filed on May 11, 2009, entitled“Reference Pictures for 3D Video,” and the filing date of U.S.Provisional Application Ser. No. 61/310,497, filed on Mar. 4, 2010,entitled “Extended SPS for 3DV sequences,” the contents of each of whichare hereby incorporated by reference in their entirety for all purposes.

TECHNICAL FIELD

Implementations are described that relate to coding systems. Variousparticular implementations relate to three-dimensional (3D) video codingschemes.

BACKGROUND

To facilitate new video applications, such as three-dimensionaltelevision (3DTV) and free-viewpoint video (FVV), 3D Video (3DV) dataformats comprising both conventional 2D video and depth maps can beutilized such that additional views can be rendered at the user end.Examples of such 3DV formats include 2D plus depth (2D+Z), whichincludes a two-dimensional (2D) video and its corresponding depth map,and Layered Depth Video (LDV), which includes 2D+Z and an occlusionvideo plus an occlusion depth. Other examples of such 3DV formatsinclude Multiview plus Depth (MVD) and Disparity Enhanced Stereo (DES).MVD is an extension of 2D+Z, as it includes multiple 2D+Z from differentviewpoints. In turn, DES is composed of two LDVs from two different viewpoints. Another example 3DV format is Layer Depth Video plus Right View(LDV+R) which is composed of one LDV of a left view and the 2D video ofthe right view. How to convey (encode and transmit) the data in allthese formats is a challenging issue, as different components are usedjointly at the user end to decode 3DV content.

SUMMARY

According to a general aspect, a priority is determined for aninter-layer reference for a picture. The priority is determined relativeto one or more other references for the picture. The inter-layerreference is included in an ordered list of references for the picturebased on the priority.

The details of one or more implementations are set forth in theaccompanying drawings and the description below. Even if described inone particular manner, it should be clear that implementations may beconfigured or embodied in various manners. For example, animplementation may be performed as a method, or embodied as anapparatus, such as, for example, an apparatus configured to perform aset of operations or an apparatus storing instructions for performing aset of operations, or embodied in a signal. Other aspects and featureswill become apparent from the following detailed description consideredin conjunction with the accompanying drawings and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an example of a depth map.

FIG. 2 is an example showing the four components of the LDV format.

FIG. 3 is a block/flow diagram of an implementation of a 3DV encoder.

FIG. 4 is a block/flow diagram of an implementation of a 3DV decoder.

FIG. 5 is a block/flow diagram of an implementation of a 3DV layerencoder.

FIG. 6 is a block/flow diagram of an implementation of a 3DV layerdecoder.

FIG. 7 is a block/flow diagram of an implementation of a videotransmission system.

FIG. 8 is a block/flow diagram of an implementation of a video receivingsystem.

FIG. 9 is a block/flow diagram of an implementation of a videoprocessing device.

FIG. 10 is a diagram of an example of a 3DV coding structure.

FIG. 11 is a block/flow diagram of a first example of a NetworkAbstraction Layer (NAL) unit stream.

FIG. 12 is a block/flow diagram of a second example of a NAL unitstream.

FIG. 13 is flow diagram of an example of a method for decoding 3DVcontent.

FIG. 14 is a flow diagram of an example of a method for encoding 3DVcontent.

FIG. 15 is a block diagram illustrating an example of an inter-layerdependency structure.

FIG. 16 is a flow diagram of an example of a method for constructing areference picture list for an encoding process.

FIG. 17 is a flow diagram of an example of a method for constructing areference picture list for a decoding process.

FIG. 18 is a flow diagram of an example of a method for encoding NALunits for an extended sequence parameter set for 3DV content.

FIG. 19 is a flow diagram of an example of a method for decoding NALunits for an extended sequence parameter set for 3DV content.

FIG. 20 is a flow diagram for an example of a method for encoding asequence parameter set with extensions.

FIG. 21 is a flow diagram for an example of a method for decoding asequence parameter set with extensions.

FIG. 22 is a block/flow diagram of an example of a first method forencoding a sequence parameter subset for an inter-layer dependencystructure for 3DV content.

FIG. 23 is a block/flow diagram of an example of a first method fordecoding a sequence parameter subset for an inter-layer dependencystructure for 3DV content.

FIG. 24 is a block/flow diagram of an example of a second method forencoding a sequence parameter subset for an inter-layer dependencystructure for 3DV content.

FIG. 25 is a block/flow diagram of an example of a second method fordecoding a sequence parameter subset for an inter-layer dependencystructure for 3DV content.

FIG. 26 is a flow diagram of an example of a method for encoding 3DVcontent.

FIG. 27 is a flow diagram of an example of a method for decoding 3DVcontent.

FIG. 28 is a flow diagram of an example of a method for constructing areference picture list for a coding operation.

FIG. 29 is a flow diagram of an example of a method for processing 2Dvideo layer pictures that may be implemented in the method of FIG. 28.

FIG. 30 is a flow diagram of an example of a method for encoding 3DVcontent and conveying inter-layer dependency structures.

FIG. 31 is a flow diagram of an example of a method for decoding 3DVcontent and conveying inter-layer dependency structures.

FIG. 32 is a block/flow diagram of an example of a NAL unit stream.

FIG. 33 is a block/flow diagram of an example of a system for managingnetwork traffic by employing inter-layer dependency structures.

FIG. 34 is a flow diagram of an example of a method for managing networktraffic by employing inter-layer dependency structures.

DETAILED DESCRIPTION

As understood in the art, a basic tenet of 3D Video (3DV) is typicallyto provide different views of a scene or an object to each eye of a userso that a user is able to perceive depth of the scene or object.Additionally, to enhance a user experience, a virtual view other thanthe views being transmitted may be rendered, for example, to adjust thebaseline distance for a different perceived depth range. To achieve oneor more of these goals, as noted above, 3D Video (3DV) representationformats may include various layers, such as video, depth, and perhapsmore supplemental information, such as 2D+Z (MVD) and LDV (DES). Tobetter illustrate the concept of depth and other supplementalinformation for 3DV content, reference is made to FIGS. 1 and 2.

FIG. 1 provides an example of a depth map 100 corresponding to aconventional video. In addition, FIG. 2 includes an example of the fourcomponents in the LDV format: 2D video 202 plus depth (Z) 204 and anocclusion video 206 for the same scene along with an occlusion depth208. Encoding and transmission of the above-described data formats arechallenging in many respects. For example, besides coding efficiency,functionalities such as synchronization and backward compatibility (forconventional monoscopic 2D video) should also preferably be provided sothat a legacy Advanced Video Coding (AVC)/Multiview Coding (MVC) decodercan extract a viewable video from the bitstream.

One solution that can address at least some of these issues issimulcast, where each view and/or layer is encoded and transmittedindependently. This approach may use multiple encoders and decoders toencode and decode the separate views/layers, respectively, and tosynchronize the views/layers into a viewable image at the system levelor application level. For example, Moving Picture Experts Group (MPEG)-CPart 3 (International Organization for Standardization(ISO)/International Electrotechnical Commission (IEC) 23002-3) specifiesa system framework for 2D+Z. Typical implementations use synchronizationat a system level between the video and depth. The video and depth canbe coded using any existing video coding standard. However, in typicalimplementations the encoding of the video and depth are decoupled. Thus,the cost of simulcast is typically multiplied by the number of viewsand/or layers transmitted. Furthermore, because different views and/orlayers are encoded separately, any redundancy among views and/or layersis typically not in any way exploited to achieve higher encodingefficiency.

In contrast, one or more implementations described herein may permitinter-layer coding to exploit redundancy between layers, and thereby toachieve higher encoding efficiency, in addition to backwardcompatibility of AVC/MVC systems. In particular, one or moreimplementations provide means to permit synchronization of views and/orlayers at a coding level to attain at least some of these benefits. Forexample, in at least one implementation described herein, a novel 3DVprefix Network Abstraction Layer (NAL) unit and a novel 3DV NAL unitheader extension on the NAL unit design of AVC are proposed toefficiently enable inter-layer coding and synchronization ofviews/layers. The high level syntax signals how the 3DV components canbe extracted from bitstreams, such as AVC and Scalable Video Coding(SVC)/MVC bitstreams. Thus, this approach has the advantage in thatthere is no need for synchronization between different 3DV components atthe system level, as the 3DV components can be coupled in the codedbitstream (such as SVC layers, or MVC views). Another potential benefitis that inter-layer or inter-view redundancy can be removed when encodedin this manner. Further, the novel NAL unit design can be compatiblewith MVC and can also permit compatibility with any future encapsulatingcoding techniques to achieve enhanced compression efficiency.

As discussed herein below, to enable synchronization for differentviews/layers at the coding level as opposed to the system level, one ormore implementations associate 3DV NAL unit designs with a 3DV viewidentifier (ID) and a 3DV layer ID. Moreover, to better exploitinter-view/layer redundancy, inter-view/layer predictions are employedto provide higher coding efficiency as compared to AVC with interleavingmethods. In addition, NAL unit designs for 3DV supplemental layers mayachieve full backward compatibility while enabling the development ofnew coding modes/tools without affecting 2D view layer compatibilitywith MVC/AVC.

Various embodiments are directed to the configuration of a referencelist to permit encoding and decoding of bitstreams including 3DV contentby employing multiple-reference prediction. For example, for 3DV codingstructures, there may be at least three possible types of referencepictures, including, for example: temporal reference pictures,inter-view reference pictures, and reference pictures from different 3DVlayers. Reference pictures from different 3DV layers may include, forexample, a 2D video layer used as reference for a depth layer. At leastone embodiment described in this application provides the concept andimplementation of how to arrange the three types of reference picturesin a reference picture list. For example, when encoding a macroblock(MB) in prediction mode, an encoder can signal which picture is, orpictures are, used as reference among multiple reference pictures thatare available. Here, an index in the list can indicate which referencepicture is used. As discussed further herein below, one or moreembodiments can provide one or more inter-layer reference pictures inthe list in order to enable inter-layer prediction.

As noted above, one or more embodiments provide many advantages, one ofwhich is potential compatibility with MVC. That is, when a 3DV bitstreamaccording to one of these embodiments is fed to a legacy MVC decoder,the 2D video (for example, specified as layer 0 below) can be decodedand outputted. To further aid compatibility with MVC while at the sametime permitting efficient coding of 3DV content using a variety oflayers, various embodiments are additionally directed to theconstruction and signaling of a sequence parameter set (SPS). Asunderstood by those of skill in the technical field, an SPS can specifycommon properties shared between pictures of a sequence of pictures.Such common properties may include, for example, picture size, optionalcoding modes employed, and a macroblock to slice group map, each ofwhich may optionally be shared between pictures in a sequence. For atleast one embodiment, an extension of SPS is employed to signal novelsequence parameters that are used for encoding and decoding 3DV content.Moreover, a separate and novel NAL unit type can be utilized for theextended SPS. The extended SPS can be used by network devices, such as arouter, to adapt the bitrate of 3DV content streaming, as discussedfurther herein below.

Prior to discussing embodiments in specific detail, some discussion ofterms employed is provided to facilitate understanding of the conceptsdescribed.

TERMINOLOGY

A “2D video” layer is generally used herein to refer to the traditionalvideo signal.

A “depth” layer is generally used herein to refer to data that indicatesdistance information for the scene objects. A “depth map” is a typicalexample of a depth layer.

An “occlusion video” layer is generally used herein to refer to videoinformation that is occluded from a certain viewpoint. The occlusionvideo layer typically includes background information for the 2D videolayer.

An “occlusion depth” layer is generally used herein to refer to depthinformation that is occluded from a certain viewpoint. The occlusiondepth layer typically includes background information for the depthlayer.

A “transparency” layer is generally used herein to refer to a picturethat indicates depth discontinuities or depth boundaries. A typicaltransparency layer has binary information, with one of the two valuesindicating positions for which the depth has a discontinuity, withrespect to neighboring depth values, greater than a particularthreshold.

A “3DV view” is defined herein as a data set from one view position,which is different from the “view” used in MVC. For example, a 3DV viewmay include more data than the view in MVC. For the 2D+Z format, a 3DVview may include two layers: 2D video plus its depth map. For the LDVformat, a 3DV view may include four layers: 2D video, depth map,occlusion video, and occlusion depth map. In addition, a transparencymap can be another layer data type within a 3DV view, among others.

A “3DV layer” is defined as one of the layers of a 3DV view. Examples of3DV layers are, for example, 2D view or video, depth, occlusion video,occlusion depth, and transparency map. Layers other than 2D view orvideo are also defined as “3DV supplemental layers”. In one or moreembodiments, a 3DV decoder can be configured to identify a layer anddistinguish that layer from others using a 3dv_layer_id. In oneimplementation, 3dv_layer_id is defined as in the Table 1. However, itshould be noted that the layers may be defined and identified in otherways, as understood by those of ordinary skill in the art in view of theteachings provided herein.

TABLE 1 3DV layers Value of 3dv_layer_id Description 0 2D video 1 Depth2 Occlusion video 3 Occlusion depth 4 Transparency map >=5 Reserved

FIGS. 3 and 4 illustrate a high-level generic 3DV encoder 300 anddecoder 400, respectively. The encoder 300/decoder 400 is composed oflayer encoders/decoders and a 3DV reference buffer. For example, a 3DVcontent signal 302, which may include, for example, 2D view, depth,occlusion view, occlusion depth, and transparency map layers, is inputto the various layer encoders as shown in FIG. 3. Specifically, theencoder system/apparatus 300 includes a 2D layer encoder 304 configuredto encode 2D layers, which may be AVC compatible, an enhanced 2D layerencoder 306 configured to encode enhanced 2D layers, a depth layerencoder 308 configured to encode depth layers, an occlusion view layerencoder 310 configured to encode occlusion view layers, an occlusiondepth layer encoder 312 configured to encode occlusion depth layers, anda transparency layer encoder 314 configured to encode transparencylayers. Thus, each layer can be encoded using a different encoder and/orencoding technique.

An enhanced 2D layer is generally used herein to distinguish such alayer from a layer that is compatible with AVC, MVC, SVC, or some otherunderlying standard. For example, enhanced 2D layers are typically notcompatible with MVC because such layers allow new coding tools, such as,for example, using inter-layer references. Such layers are, therefore,generally not backward compatible with MVC.

Note that the term “enhanced 2D layer” (or supplemental layer) may alsobe used to refer to layers that could be coded with MVC, but which wouldnot be expected to be displayed and so are not typically described asbeing coded with MVC. For example, a series of depth layers could betreated by MVC as a series of pictures and could be coded by MVC.However, it is not typical to display depth layers, so it is oftendesirable to have a different way of identifying and coding such layers,other than by using MVC.

Each layer can also use a different reference. The reference may be froma different layer than the picture/block being encoded (decoded). Thereferences from different layers may be obtained from a 3DV ReferenceBuffer 316 (3DV Reference/Output Buffer 414). As shown in FIG. 3, eachlayer encoder is in signal communication with the 3DV reference buffer316 to permit various modes of encoding of the input signal 302 togenerate an output signal 318.

By utilizing the 3DV Reference Buffer 316, each layer of the 3DV formatcan be encoded using references from its own layer, such as, forexample, temporal references and/or inter-view references within thesame layer with motion and/or disparity compensation, and/or usinginter-layer prediction between the various layers. For example, aninter-layer prediction may reuse motion information, such as, forexample, motion vector, reference index, etc., from another layer toencode the current layer, also referred to as motion skip mode. In thisway, the output signal 318 may be interleaved with various layerinformation for one or more 3DV views. The inter-layer prediction may beof any kind of technique that is based on the access of the otherlayers.

With regard to the decoder system/apparatus 400, system 400 includesvarious layer decoders to which signal 318 may be input as shown in FIG.4. In particular, the encoder system/apparatus 400 includes a 2D layerdecoder 402, which may be AVC compatible, configured to decode 2Dlayers, an enhanced 2D layer decoder 404 configured to decode enhanced2D layers, a depth layer decoder 406 configured to decode depth layers,an occlusion view layer decoder 408 configured to decode occlusion viewlayers, an occlusion depth layer decoder 410 configured to decodeocclusion depth layers, and/or a transparency layer decoder 412configured to decode transparency layers.

As illustrated in FIG. 4, each layer decoder is in signal communicationwith a 3DV reference/output buffer 414, which can be configured to parsedecoded layer information received from the layer decoders and todetermine how the layers included in the input signal fit into astructure that supports 3D processing. Such 3D processing may include,for example, coding of 3D layers as described herein or rendering(synthesizing) of additional pictures at a receiver or display unit.Rendering may use, for example, depth pictures to warp a 2D video and/orocclusion pictures to fill in holes of a rendered picture withbackground information

In addition, the 3DV reference/output buffer 414 can be configured togenerate an output signal 416 in a 3DV compatible format forpresentation to a user. The formatted 3DV content signal 416 may, ofcourse, include, for example, 2D view, depth, occlusion view, occlusiondepth, and transparency map layers. The output buffer may be implementedtogether with the reference buffer, as shown in FIG. 4, or,alternatively in other embodiments, the reference and output buffers maybe separated.

Other implementations of the encoder 300 and the decoder 400 may usemore or fewer layers. Additionally, different layers than those shownmay be used.

It should be clear that the term “buffer”, as used in the 3DV ReferenceBuffer 316 and in the 3DV Reference/Output Buffer 414, is an intelligentbuffer. Such buffers may be used, for example, to store pictures, toprovide references (or portions of references), and to reorder picturesfor output. Additionally, such buffers may be used, for example, toperform various other processing operations such as, for example,hypothetical reference decoder testing, processing of marking commands(for example, memory management control operations in AVC), and decodedpicture buffer management.

FIGS. 5 and 6 respectively depict high level block/flow diagrams of ageneral 3DV layer encoder 500 and decoder 600, respectively, that can beused to implement any one or more of layer encoders 304-314 and any oneor more of layer decoders 402-412, respectfully. It is noted that eachof the layer encoders 304-314 can be designed in the same general mannerwith respect to their corresponding layers, as, for example, depicted inFIG. 5, to favor particular purposes. Conversely, the layer encoders maybe configured differently to better utilize their uniquecharacteristics, as understood in view of the teachings provided herein.Similarly, decoders 402-412 can be designed in the same general mannerwith respect to their corresponding layers, as, for example, depicted inFIG. 6. Conversely, the layer decoders may be configured differently tobetter utilize their unique characteristics.

It should be noted that with regard to an MVC encoder, the input iscomposed of multiple views. Each view is a traditional 2D video. Thus,compared to an AVC encoder, the typical MVC encoder includes additionalblocks such as a disparity estimation block, a disparity compensationblock, and an inter-view reference buffer. Analogously, FIGS. 5 and 6include blocks for 3DV references and inter-layer prediction. With a 3DVencoder, the input is composed of multiple 3D views. As stated above,each 3D view can comprise several layers. Accordingly, the encodingmethod for each layer can be designed differently to utilize theirunique features. Consequently, a 3DV encoder can be divided into layerencoders, as shown in FIG. 3. However, the layer encoders may also beclosely coupled. The techniques used in the layer encoders may betailored as desired for a given system. Since each layer appears as avideo signal, the layers can have a similar structure at a high level asshown in FIG. 5. It should be noted the layer encoders can bedifferently designed at lower, more specific levels. Of course, oneembodiment may also use a single encoder configured to encode alllayers.

With regard to the high level diagram illustrated in FIG. 5, 3DV layerencoder 500 may include a layer partitioner 504 configured to receiveand partition 3DV view layers from each other for a 3DV view i withininput signal 502. The partitioner 504 is in signal communication with anadder or combiner 506, with a displacement (motion/disparity)compensation module 508, and with a displacement (motion/disparity)estimation module 510, each of which receives a set of partitionedlayers from partitioner 504. Another input to the adder 506 is one of avariety of possible reference picture information received throughswitch 512.

For example, if a mode decision module 536 in signal communication withthe switch 512 determines that the encoding mode should beintra-prediction with reference to the same block or slice currentlybeing encoded, then the adder receives its input from intra-predictionmodule 530. Alternatively, if the mode decision module 536 determinesthat the encoding mode should be displacement compensation andestimation with reference to a block or slice, of the same frame or 3DVview or 3DV layer currently being processed or of another previouslyprocessed frame or 3DV view or 3DV layer, that is different from theblock or slice currently being encoded, then the adder receives itsinput from displacement compensation module 508, as shown in FIG. 5.Further, if the mode decision module 536 determines that the encodingmode should be 3DV inter-layer prediction with reference to a 3DV layer,of the same frame or 3DV view currently being processed or anotherpreviously processed frame or 3DV view, that is different from the layercurrently being processed, then the adder receives its input from the3DV inter-layer prediction module 534, which is in signal communicationwith 3DV Reference Buffer 532.

The adder 506 provides a signal including 3DV layer(s) and prediction,compensation, and/or estimation information to the transform module 514,which is configured to transform its input signal and provide thetransformed signal to quantization module 516. The quantization module516 is configured to perform quantization on its received signal andoutput the quantized information to an entropy encoder 518. The entropyencoder 518 is configured to perform entropy encoding on its inputsignal to generate bitstream 520. The inverse quantization module 522 isconfigured to receive the quantized signal from quantization module 516and perform inverse quantization on the quantized signal. In turn, theinverse transform module 524 is configured to receive the inversequantized signal from module 522 and perform an inverse transform on itsreceived signal. Modules 522 and 524 recreate or reconstruct the signaloutput from adder 506.

The adder or combiner 526 adds (combines) signals received from theinverse transform module 524 and the switch 512 and outputs theresulting signals to intra prediction module 530 and deblocking filter528. Further, the intra prediction module 530 performs intra-prediction,as discussed above, using its received signals. Similarly, thedeblocking filter 528 filters the signals received from adder 526 andprovides filtered signals to 3DV reference buffer 532.

The 3DV reference buffer 532, in turn, parses its received signal. The3DV reference buffer 532 aids in inter-layer and displacementcompensation/estimation encoding, as discussed above, by elements 534,508, and 510. The 3DV reference buffer 532 provides, for example, all orpart of various 3DV layers.

With reference again to FIG. 6, the 3DV layer decoder 600 can beconfigured to receive bitstream 318 using bitstream receiver 602, whichin turn is in signal communication with bitstream parser 604 andprovides the bitstream to parser 604. The bit stream parser 604 can beconfigured to transmit a residue bitstream 605 to entropy decoder 606,transmit control syntax elements 607 to mode selection module 622,transmit displacement (motion/disparity) vector information 609 todisplacement compensation (motion/disparity) module 618 and transmitcoding information 611 from 3DV layers other than the 3DV layercurrently decoded to 3DV inter-layer prediction module 620. The inversequantization module 608 can be configured to perform inversequantization on an entropy decoded signal received from the entropydecoder 606. In addition, the inverse transform module 610 can beconfigured to perform an inverse transform on an inverse quantizedsignal received from inverse quantization module 608 and to output theinverse transformed signal to adder or combiner 612.

Adder 612 can receive one of a variety of other signals depending on thedecoding mode employed. For example, the mode decision module 622 candetermine whether 3DV inter-layer prediction, displacement compensationor intra prediction encoding was performed on the currently processedblock by the encoder 500 by parsing and analyzing the control syntaxelements 607. Depending on the determined mode, model selection controlmodule 622 can access and control switch 623, based on the controlsyntax elements 607, so that the adder 612 can receive signals from the3DV inter-layer prediction module 620, the displacement compensationmodule 618 or the intra prediction module 614.

Here, the intra prediction module 614 can be configured to, for example,perform intra prediction to decode a block or slice using references tothe same block or slice currently being decoded. In turn, thedisplacement compensation module 618 can be configured to, for example,perform displacement compensation to decode a block or a slice usingreferences to a block or slice, of the same frame or 3DV view or 3DVlayer currently being processed or of another previously processed frameor 3DV View or 3DV layer, that is different from the block or slicecurrently being decoded. Further, the 3DV inter-layer prediction module620 can be configured to, for example, perform 3DV inter-layerprediction to decode a block or slice using references to a 3DV layer,of the same frame or 3DV view currently processed or of anotherpreviously processed frame or 3DV view, that is different from the layercurrently being processed.

After receiving prediction or compensation information signals, theadder 612 can add the prediction or compensation information signalswith the inverse transformed signal for transmission to a deblockingfiler 602. The deblocking filter 602 can be configured to filter itsinput signal and output decoded pictures. The adder 612 can also outputthe added signal to the intra prediction module 614 for use in intraprediction. Further, the deblocking filter 602 can transmit the filteredsignal to the 3DV reference buffer 616. The 3DV reference buffer 316 canbe configured to parse its received signal to permit and aid ininter-layer and displacement compensation decoding, as discussed above,by elements 618 and 620, to each of which the 3DV reference buffer 616provides parsed signals. Such parsed signals may be, for example, all orpart of various 3DV layers.

It should be understood that systems/apparatuses 300, 400, 500, and 600can be configured differently and can include different elements asunderstood by those of ordinary skill in the art in view of theteachings disclosed herein.

With reference now to FIG. 7, FIG. 7 illustrates a video transmissionsystem/apparatus 700, to which aspects described herein may be applied,in accordance with an implementation. The video transmission system 700may be, for example, a head-end or transmission system for transmittinga signal using any of a variety of media, such as, for example,satellite, cable, telephone-line, or terrestrial broadcast. Thetransmission may be provided over the Internet or some other network.

The video transmission system 700 is capable of generating anddelivering, for example, video content and depth, along with other 3DVsupplemental layers. This is achieved by generating an encoded signal(s)including 3DV supplemental layer information or information capable ofbeing used to synthesize the 3DV supplemental layer information at areceiver end that may, for example, have a decoder.

The video transmission system 700 includes an encoder 710 and atransmitter 720 capable of transmitting the encoded signal. The encoder710 receives video information and generates an encoded signal(s) basedon the video information and/or 3DV layer information. The encoder 710may be, for example, the encoder 300 described in detail above. Theencoder 710 may include sub-modules, including for example an assemblyunit for receiving and assembling various pieces of information into astructured format for storage or transmission. The various pieces ofinformation may include, for example, coded or uncoded video, coded oruncoded depth information, and coded or uncoded elements such as, forexample, motion vectors, coding mode indicators, and syntax elements.

The transmitter 720 may be, for example, adapted to transmit a programsignal 750 having one or more bitstreams representing encoded picturesand/or information related thereto. Typical transmitters performfunctions such as, for example, one or more of providingerror-correction coding, interleaving the data in the signal,randomizing the energy in the signal, and modulating the signal onto oneor more carriers using modulator 722. The transmitter 720 may include,or interface with, an antenna (not shown). Further, implementations ofthe transmitter 720 may include, or be limited to, a modulator.

Referring to FIG. 8, FIG. 8 shows a video receiving system/apparatus 800to which the aspects described herein may be applied, in accordance withan implementation. The video receiving system 800 may be configured toreceive signals over a variety of media, such as, for example,satellite, cable, telephone-line, or terrestrial broadcast. The signalsmay be received over the Internet or some other network.

The video receiving system 800 may be, for example, a cell-phone, acomputer, a set-top box, a television, or other device that receivesencoded video and provides, for example, decoded video for display to auser or for storage. Thus, the video receiving system 800 may provideits output to, for example, a screen of a television, a computermonitor, a computer (for storage, processing, or display), or some otherstorage, processing, or display device.

The video receiving system 800 is capable of receiving and processingvideo content including video information. The video receiving system800 includes a receiver 810 capable of receiving an encoded signal, suchas for example the signals described in the implementations of thisapplication, and a decoder 820 capable of decoding the received signal.

The receiver 810 may be, for example, adapted to receive a programsignal having a plurality of bitstreams representing encoded pictures.Typical receivers perform functions such as, for example, one or more ofreceiving a modulated and encoded data signal, demodulating the datasignal from one or more carriers using a demodulator 822, de-randomizingthe energy in the signal, de-interleaving the data in the signal, anderror-correction decoding the signal. The receiver 810 may include, orinterface with, an antenna (not shown). Implementations of the receiver810 may include, or be limited to, a demodulator.

The decoder 820 outputs video signals including video information anddepth information. The decoder 820 may be, for example, the decoder 400described in detail above.

The input to the system 700 is listed, in FIG. 7, as “input video(s)”,and the output from the system 800 is listed, in FIG. 8, as “outputvideo”. It should be clear that, at least in these implementations,these refer to 3D videos that include multiple layers.

With reference to FIG. 9, FIG. 9 illustrates a video processing device900 to which aspects described herein may be applied, in accordance withan implementation. The video processing device 900 may be, for example,a set top box or other device that receives encoded video and provides,for example, decoded video for display to a user or for storage. Thus,the video processing device 900 may provide its output to a television,computer monitor, or a computer or other processing device.

The video processing device 900 includes a front-end (FE) device 905 anda decoder 910. The front-end device 905 may be, for example, a receiveradapted to receive a program signal having a plurality of bitstreamsrepresenting encoded pictures, and to select one or more bitstreams fordecoding from the plurality of bitstreams. Typical receivers performfunctions such as, for example, one or more of receiving a modulated andencoded data signal, demodulating the data signal, decoding one or moreencodings (for example, channel coding and/or source coding) of the datasignal, and/or error-correcting the data signal. The front-end device905 may receive the program signal from, for example, an antenna (notshown). The front-end device 905 provides a received data signal to thedecoder 910.

The decoder 910 receives a data signal 920. The data signal 920 mayinclude, for example, one or more Advanced Video Coding (AVC), ScalableVideo Coding (SVC), or Multi-view Video Coding (MVC) compatible streams.

AVC refers more specifically to the existing International Organizationfor Standardization/International Electrotechnical Commission (ISO/IEC)Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding(AVC) standard/International Telecommunication Union, TelecommunicationSector (ITU-T) H.264 Recommendation (hereinafter the “H.264/MPEG-4 AVCStandard” or variations thereof, such as the “AVC standard” or simply“AVC”).

MVC refers more specifically to a multi-view video coding (“MVC”)extension (Annex H) of the AVC standard, referred to as H.264/MPEG-4AVC, MVC extension (the “MVC extension” or simply “MVC”).

SVC refers more specifically to a scalable video coding (“SVC”)extension (Annex G) of the AVC standard, referred to as H.264/MPEG-4AVC, SVC extension (the “SVC extension” or simply “SVC”).

The decoder 910 decodes all or part of the received signal 920 andprovides as output a decoded video signal 930. The decoded video 930 isprovided to a selector 950. The device 900 also includes a userinterface 960 that receives a user input 970. The user interface 960provides a picture selection signal 980, based on the user input 970, tothe selector 950. The picture selection signal 980 and the user input970 indicate which of multiple pictures, sequences, scalable versions,views, or other selections of the available decoded data a user desiresto have displayed. The selector 950 provides the selected picture(s) asan output 990. The selector 950 uses the picture selection information980 to select which of the pictures in the decoded video 930 to provideas the output 990.

In various implementations, the selector 950 includes the user interface960, and in other implementations no user interface 960 is neededbecause the selector 950 receives the user input 970 directly without aseparate interface function being performed. The selector 950 may beimplemented in software or as an integrated circuit, for example. In oneimplementation, the selector 950 is incorporated with the decoder 910,and in another implementation, the decoder 910, the selector 950, andthe user interface 960 are all integrated.

In one application, front-end 905 receives a broadcast of varioustelevision shows and selects one for processing. The selection of oneshow is based on user input of a desired channel to watch. Although theuser input to front-end device 905 is not shown in FIG. 9, front-enddevice 905 receives the user input 970. The front-end 905 receives thebroadcast and processes the desired show by demodulating the relevantpart of the broadcast spectrum, and decoding any outer encoding of thedemodulated show. The front-end 905 provides the decoded show to thedecoder 910. The decoder 910 is an integrated unit that includes devices960 and 950. The decoder 910 thus receives the user input, which is auser-supplied indication of a desired view to watch in the show. Thedecoder 910 decodes the selected view, as well as any required referencepictures from other views, and provides the decoded view 990 for displayon a television (not shown).

Continuing the above application, the user may desire to switch the viewthat is displayed and may then provide a new input to the decoder 910.After receiving a “view change” from the user, the decoder 910 decodesboth the old view and the new view, as well as any views that are inbetween the old view and the new view. That is, the decoder 910 decodesany views that are taken from cameras that are physically located inbetween the camera taking the old view and the camera taking the newview. The front-end device 905 also receives the information identifyingthe old view, the new view, and the views in between. Such informationmay be provided, for example, by a controller (not shown in FIG. 9)having information about the locations of the views, or the decoder 910.Other implementations may use a front-end device that has a controllerintegrated with the front-end device.

The decoder 910 provides all of these decoded views as output 990. Apost-processor (not shown in FIG. 9) interpolates between the views toprovide a smooth transition from the old view to the new view, anddisplays this transition to the user. After transitioning to the newview, the post-processor informs (through one or more communicationlinks not shown) the decoder 910 and the front-end device 905 that onlythe new view is needed. Thereafter, the decoder 910 only provides asoutput 990 the new view.

The system/apparatus 900 may be used to receive multiple views of asequence of images, and to present a single view for display, and toswitch between the various views in a smooth manner. The smooth mannermay involve interpolating between views to move to another view.Additionally, the system 900 may allow a user to rotate an object orscene, or otherwise to see a three-dimensional representation of anobject or a scene. The rotation of the object, for example, maycorrespond to moving from view to view, and interpolating between theviews to obtain a smooth transition between the views or simply toobtain a three-dimensional representation. That is, the user may“select” an interpolated view as the “view” that is to be displayed.

It should be clear that the video transmission system 700, the videoreceiving system 800, and the video processing device 900, may all beadapted for use with the various implementations described in thisapplication. For example, systems 700, 800, and 900, may be adapted tooperate with data in one of the 3DV formats discussed, as well as withthe associated signaling information.

Embodiment 1 3DV Prefix NAL Unit

In this embodiment, a new NAL unit type is introduced and referred to asa “3DV prefix NAL unit,” denoted as “16,” which can precede Video CodingLayer (VCL) NAL units or MVC prefix NAL units (with nal_unit_typedenoted as 14) for a particular 3DV view or 3DV layer. The VCL NAL unitsand MVC prefix units are described in detail in Gary Sullivan, et. al.,“Editors' draft revision to ITU-T Rec. H.264|ISO/IEC 14496-10 AdvancedVideo Coding”, JVT-AD007, January-February 2009, Geneva CH (hereinafter‘AVC Draft’), incorporated herein by reference, which relates toproposed AVC standards. The meaning of many terms and abbreviations thatare used but not explicitly defined herein can be found in the AVC draftand are understandable by those of ordinary skill in the relevanttechnical field. The use of “16” to denote the 3DV prefix NAL unit isarbitrary and can be chosen to be any reserved NAL unit type in the AVCdraft.

Table 2 provided below is a modified version of Table 7-1 in the AVCdraft for nal_unit_type codes and defines the 3DV prefix NAL unit 16.Table 7-1 in the AVC draft is reproduced below as Table 3. It should benoted that Table 2 also includes modifications for Embodiment 3,discussed in more detail below. The 3DV prefix NAL unit 16 permits MVCcompatible decoders to decode all transmitted 3DV layers, including the3DV supplemental layers, and also permits synchronization of 3DV viewsand layers at a coding level. Rows 2-5 (NAL unit types 16-23) of Table 2reflect syntax changes to Table 3.

TABLE 2 NAL unit type codes, syntax element categories, and NAL unittype classes Annex G and Annex A Annex H NAL unit NAL unit nal_unit_typeContent of NAL unit and RBSP syntax structure C type class type class  0. . . 15 As defined in Table 7-1 in AVC draft 16 3DV prefix NAL unitnon-VCL non-VCL 17 . . . 20 Reserved 21 Coded 3DV slice extension 2, 3,4 non-VCL VCL 3dv_slice_layer_extension_rbsp( ) 22 . . . 23 Reservednon-VCL non-VCL 24 . . . 31 As defined in Table 7-1 in AVC draft non-VCLnon-VCL

TABLE 3 NAL unit type codes, syntax element categories, and NAL unittype classes Annex A NAL unit Annex G and Content of NAL unit and RBSPtype Annex H NAL nal_unit_type syntax structure C class unit type class 0 Unspecified non-VCL non-VCL  1 Coded slice of a non-IDR picture 2, 3,4 VCL VCL slice_layer_without_partitioning_rbsp( )  2 Coded slice datapartition A 2 VCL not applicable slice_data_partition_a_layer_rbsp( )  3Coded slice data partition B 3 VCL not applicableslice_data_partition_b_layer_rbsp( )  4 Coded slice data partition C 4VCL not applicable slice_data_partition_c_layer_rbsp( )  5 Coded sliceof an IDR picture 2, 3 VCL VCL slice_layer_without_partitioning_rbsp( ) 6 Supplemental enhancement 5 non-VCL non-VCL information (SEI)sei_rbsp( )  7 Sequence parameter set 0 non-VCL non-VCLseq_parameter_set_rbsp( )  8 Picture parameter set 1 non-VCL non-VCLpic_parameter_set_rbsp( )  9 Access unit delimiter 6 non-VCL non-VCLaccess_unit_delimiter_rbsp( ) 10 End of sequence 7 non-VCL non-VCLend_of_seq_rbsp( ) 11 End of stream 8 non-VCL non-VCLend_of_stream_rbsp( ) 12 Filler data 9 non-VCL non-VCL filler_data_rbsp() 13 Sequence parameter set extension 10  non-VCL non-VCLseq_parameter_set_extension_rbsp( ) 14 Prefix NAL unit 2 non-VCL suffixdependent prefix_nal_unit_rbsp( ) 15 Subset sequence parameter set 0non-VCL non-VCL subset_seq_parameter_set_rbsp( ) 16 . . . 18 Reservednon-VCL non-VCL 19 Coded slice of an auxiliary coded 2, 3, 4 non-VCLnon-VCL picture without partitioningslice_layer_without_partitioning_rbsp( ) 20 Coded slice extension 2, 3,4 non-VCL VCL slice_layer_extension_rbsp( ) 21 . . . 23 Reserved non-VCLnon-VCL 24 . . . 31 Unspecified non-VCL non-VCL

A more detailed description of the proposed 3DV prefix NAL unit is shownin Table 4 below.

TABLE 4 3DV prefix NAL unit 3dv_prefix_nal_unit( ) { C Descriptor 3dv_view_id All u(7)  3dv_layer_id All u(3)  reserved_bits All u(6) }As illustrated in Table 4, the 3DV prefix NAL unit may include a3dv_view_id and a 3dv_layer_id. The 3dv_view_id specifies a 3DV view IDnumber of the frame associated with a 3DV view. In addition, the3dv_layer_id specifies the 3DV layer ID number of the associated frame.The reserved_bits permits the NAL unit to be byte aligned. It should beunderstood that the numbers of bits used for each syntax element andtheir coding method are provided only as an example. It should also benoted that the header of NAL unit 16 can include a standard first byte,as in the first three elements of Table 9 below. In this embodiment, theNAL unit 16 can include a header and an extended header and need notinclude a payload. A NAL unit 16 can be transmitted, for example, priorto every 3DV layer frame or prior to every slice of a 3DV layer frame.

To better illustrate how the 3DV prefix NAL unit may be employed,reference is made to FIG. 10, which shows an example of 3DV contentcomprising a structure 1000 of 3DV views 1002, 1004, and 1006. Hereviews 1002, 1004, and 1006 provide different perspectives of the samescene or object. In this example, each 3DV view is further composed oftwo layers: 2D view 1010 plus its depth 1008. The arrows in FIG. 10 showthe coding dependency between the different views and layers. Forexample, the B view 1004, a bi-directionally predicted view, for codingpurposes, depends on and references the Base view 1002 and the P view1006, a predictive view. Similarly, the P view 1006 depends on andreferences the base view 1002. Here, the depth layer 1008 of each 3DVview references the 2D view layer 1010 of the corresponding 3DV view. Itshould be noted that the 3DV views and dependencies could be extended to3DV content having additional 3DV supplemental layers, such as those inaccordance with MVD, LDV, DES formats, by persons of ordinary skill inthe art in view of the teachings provided herein. It should also benoted that the dependencies provided in FIG. 10 are only examples andthat the use of 3DV prefix NAL unit permits a variety of otherdependencies.

A NAL unit stream for the 3DV content in FIG. 10 in accordance with thisembodiment is illustrated in FIG. 11. In particular, FIG. 11 provides astream of NAL units 1100 for different times, TO 1102 and T1 1110, for avideo presentation. Here, view 1104 and view 1112 (3DV View 0)correspond to base view 1002 at times T0 and T1, respectively, in thatthey are associated with the same perspective or viewpoint as base view1002. Similarly, view 1106 and view 1114 (3DV view 2) correspond to Pview 1006 at times T0 and T1, respectively, while view 1108 and view1116 (3DV view 1) correspond to B view 1004 at times T0 and T1,respectively.

As shown in FIG. 11, each 3DV view is composed of a 2D view layer and adepth layer. However, it should be understood that additionalsupplemental layers can be employed in other embodiments. Here, view1104 is composed of a 2D view layer 1118 and a depth layer 1120. The 2Dview layer 1118 is itself composed of NAL units 16 (1126), 14 (1128),and 5 (1130), while the depth layer 1120 is composed of a NAL unit 16and NAL unit 20 (1132). In turn, 2D view layer 1122 and depth layer 1124of view 1106 are themselves composed of a NAL unit 16 and a NAL unit 20,as shown in FIG. 11. View 1112 is composed of both a depth layer,including NAL units 16 and 20, and a 2D view layer 1136, including NALunits 16, 14 and 1 (1134).

The arrows of FIG. 11 indicate the transmission order of NAL units. Forexample, NAL unit 16 (1126) is transmitted before NAL unit 14 (1128),which is itself transmitted before NAL unit 5 (1130), etc. NAL unit 16is defined in Tables 2 and 4 while the other NAL units illustrated inFIG. 11 are defined in Table 3. For example, NAL unit 5 includes videodata of a coded slice of an instantaneous decoding refresh (IDR) picturethat is composed of only intra slices or SI slices, as defined in theAVC draft. Generally, the IDR picture is coded using intra predictiononly or using intra prediction only and quantization of predictionsamples. Further, NAL unit 1 includes video data of a coded slice of anon-IDR picture, such as a bi-directionally (B) coded picture or apredictively (P) coded picture, which in turn can reference otherpictures, 3DV layers or 3DV views. In turn, NAL unit 20 is a coded sliceextension that can reference another layer, as indicated, for example,in FIG. 10, or another 3DV view. It should also be noted that NAL units1, 5 and 20 shown in FIG. 11 are representative of many such units andhave been truncated for ease of presentation. For example, after prefixunits 16 and 14 have been transmitted for 2D view 1118, several NALunits 5 (1130) can be transmitted until all slices of the correspondingframe have been sent. Similarly, after a prefix NAL unit 16 has beentransmitted for a depth view, a plurality of NAL units 20 composing thedepth layer frame can be transmitted. NAL unit 1 in FIG. 11 is similarlya truncated representation of the slices corresponding to the frame ofthe 2D view layer 1136.

Each NAL unit 14 is a prefix NAL unit, as described above, indicating anMVC view ID for its corresponding layer. For example, NAL unit 14includes an MVC view ID for its corresponding 2D view layer 1118.Similarly, NAL unit 20 also includes an MVC view ID for itscorresponding 3DV layer. In this embodiment, every 3DV layer is coded asa separate MVC view and thus is allocated a unique MVC view_id duringits coding. The encoder, such as encoder 300 discussed above, can usethe MVC view_id to indicate the dependency between layers and/or framesin a sequence parameter set (SPS), as discussed further herein belowwith respect to embodiments 5-7, and can specify the corresponding3dv_view_id and 3dv_layer_id in the prefix NAL unit 16 such that thedecoder, such as decoder 400, can interpret and decode a frame in thecorrect manner using the 3DV prefix NAL unit.

As an example, the MVC view_id of each 3DV layer can be set as in Table5. Thus, in the architecture of embodiment 1, any NAL unit with MVCview_id equal to 4 shall be preceded by a prefix NAL unit 16 with3dv_view_id set as 2 and 3dv_layer_id set as 0. The actual valuesallocated here are arbitrary and can be varied as long as the different3DV views, each corresponding to a different perspective or view point,are uniquely identified and their corresponding 3DV layers areadequately identified and conveyed. It should also be noted that thevalues in Table 5 are consistent across different times. For example,views 1104 and 1112 share the same MVC view, 3DV view and 3DV layer IDs.

TABLE 5 Example of MVC view_id in Embodiment 1 MVC view_id 3dv_view_id3dv_layer_id Description 0 0 0 2D video 1 0 1 Depth 2 1 0 2D video 3 1 1Depth 4 2 0 2D video 5 2 1 Depth

It should be understood that the bitstream defined in embodiment 1 isMVC compatible and every 3DV view and all of its layers can be decodedby a conventional MVC decoder. Thus, the 3DV prefix NAL unit 16 permitsMVC compatible decoders to decode all transmitted 3DV layers, includingthe 3DV supplemental layers. However, although conventional MVC decoderwould not be aware of how to organize the decoded data into a 3DVformat, use of the NAL unit 16 permits synchronization of 3DV views andlayers at a coding level by embodiments. For example, 3DV referencebuffer 316 of encoder 300 illustrated in FIG. 3 can include appropriate3DV prefix units, in accordance with the above-disclosed teaching, inbitstream 318, while 3DV reference buffer 414 of decoder 400 of FIG. 4can interpret the NAL units in bitstream 318 and construct and format3DV content using the NAL units accordingly, so that they conform to thestructures discussed with respect to FIGS. 10 and 11 above.

It should be noted that the MVC backward compatibility is achieved inthat every 2D view layer of a 3DV view can be decoded and formatted by aconventional MVC decoder in accordance with MVC. However, because thedepth layers and other 3DV supplemental layers would include their ownunique MVC view ID, the 3DV supplemental layers would be interpreted byan MVC decoder as a separate MVC view. Thus, if 3DV supplemental layerswere formatted and displayed in accordance with MVC, the displayed imagewould ordinarily not have a three-dimensional effect. As such, a usercan search through and attempt to display MVC views until a viewable 3Dimage is presented. Here, a viewable 3D view would be presented whenevertwo 2D view layers are selected/displayed and presented to each eye of auser.

Additionally, a user may also be able to view 3D images if the user'sdisplay is configured to accept the 3DV supplemental layers astransmitted using, for example, Embodiment 1, and produce 3D images. Forexample, a user's display may accept LDV formatted input and produce 3Dimages from that input. In such a case, a user may, for example, selecta mode on the user's display to indicate that the input is in LDVformat.

Embodiment 2 Reusing MVC view_id under 3DV

In accordance with an embodiment 2, as an alternative implementation ofembodiment 1, novel encoding and decoding processes on the NAL unitheader are proposed. Here, the details provided above with regard toembodiment 1 apply to embodiment 2, except that a specific numberingmethod involving the MVC view_id is employed so that use of the 3DVprefix NAL unit 16 is avoided. For example, as the MVC view_id isdefined to have 10 bits, the 3 least significant bits of the MVC view_idcan indicate the 3dv_layer_id and the 7 most significant bits of the MVCview_id can indicate the 3dv_view_id. Consequently, the MVC view_id inTable 5 can be set as in Table 6 below. Thus, the 3DV content providedin FIG. 11 would be the same for embodiment 2 except that the NAL unit16 would not be present in embodiment 2 and the decoder can store anduse Table 6 to determine 3DV view IDs and 3DV layer IDs from extractedMVC view IDs in the bitstreams by cross-referencing the extracted MVCview IDs to 3DV view IDs and 3DV layer IDs. Accordingly, the NAL prefixunit 14 and/or the NAL unit 20 can be configured in accordance with anumbering method involving the MVC view ID. Here, as discussed above,the MVC view ID can be employed to convey the 3DV view ID and the 3DVlayer ID to permit synchronization and formatting of 3DV content at thecoding level.

TABLE 6 Example of MVC view_id in Embodiment 2 MVC view_id 3dv_view_id3dv_layer_id Description 0 0 0 2D video 1 0 1 Depth 8 1 0 2D video 9 1 1Depth 16 2 0 2D video 17 2 1 Depth

Embodiment 3 3DV NAL Unit Extension

In embodiments 1 and 2, certain MVC coding techniques were used to codeall the 3DV layers and, as such, all the 3DV layers were decodable by aconventional MVC decoder. However, a conventional MVC decoderimplementing the current MVC standard does not compose each of thevarious 3DV layers into a 3DV format, as discussed above. In Embodiment3, a coding framework is proposed that permits the introduction ofadditional coding techniques, that are not part of the current MVCstandard, and that are applicable to certain 3DV views and/or certain3DV layers.

To achieve this goal, a novel new NAL unit type, referred to herein as“21,” as shown in Table 2 above, can be employed. Similar to NAL unit16, the reference number chosen for the novel NAL unit of embodiment 3can be any number reserved by the AVC draft in Table 3. Here, any 3DVview and/or 3DV layer that need not be decoded by an MVC decoder can useNAL unit type 21 to decode 3DV content. Further, all the 2D view layersthat can be decoded and properly interpreted by an MVC decoder can becoded in conventional NAL unit types, such as 1, 5, and 20, as discussedabove, and they are referred as MVC compatible 2D views. MVC compatible2D views can be preceded by a 3DV prefix NAL unit, such as NAL unit 16,as described with respect to Embodiment 1; or an MVC view_id numberingmethod can be specified so as to avoid the 3DV prefix NAL unit, asdescribed with respect to Embodiment 2.

Similar to the AVC draft MVC NAL unit header extension, provided belowin Table 7, a novel 3DV NAL unit header extension is proposed andprovided in Table 8 below.

TABLE 7 NAL unit header MVC extension nal_unit_header_mvc_extension( ) {C Descriptor  non_idr_flag All u(1)  priority_id All u(6)  view_id Allu(10)  temporal_id All u(3)  anchor_pic_flag All u(1)  inter_view_flagAll u(1)  reserved_one_bit All u(1) }

TABLE 8 NAL unit header 3DV extension nal_unit_header_3dv_extension( ) {C Descriptor  non_idr_flag All u(1)  priority_id All u(6)  3dv_view_idAll u(7)  3dv_layer_id All u(3)  temporal_id All u(3)  anchor_pic_flagAll u(1)  inter_view_flag All u(1)  reserved_one_bit All u(1) }As shown in Tables 7 and 8, the 3DV NAL unit header extension caninclude the same syntax elements as the MVC NAL unit header extension,except that the syntax element of view_id MVC NAL unit header extensionis replaced by two syntax elements, 3dv_view_id and 3dv_layer_id, in the3DV NAL unit header extension. Here, in embodiment 3, 3dv_view_idspecifies a 3DV view ID number of the associated frame. The same3dv_view_id is shared among 3DV view layers from the same view position.In turn, 3dv_layer_id specifies the 3DV layer ID number of theassociated frame. The call for nal_unit_header_(—)3dv_extension( ) isshown in Table 9 below.

TABLE 9 NAL unit syntax nal_unit( NumBytesInNALunit ) { C Descriptor forbidden_zero_bit All f(1)  nal_ref_idc All u(2)  nal_unit_type Allu(5)  NumBytesInRBSP = 0  nalUnitHeaderBytes = 1  If( nal_unit_type = =14 || nal_unit_type = = 20 ) {   svc_extension_flag All u(1)   if(svc_extension_flag )    nal_unit_header_svc_extension( ) All   Else   nal_unit_header_mvc_extension( ) All   nalUnitHeaderBytes += 3  } If( nal_unit_type = = 21 ) {   nal_unit_header_3dv_extension( )  nalUnitHeaderBytes += 3  }  for( I = nalUnitHeaderBytes;  i <NumBytesInNALunit; i++ ) {   if( i + 2 < NumBytesInNALunit &&  next_bits( 24 ) = = 0x000003 ) {    rbsp_byte[ NumBytesInRBSP++ ] Allb(8)    rbsp_byte[ NumBytesInRBSP++ ] All b(8)    i += 2   emulation_prevention_three_byte /* All f(8)    equal to 0x03 */   }else    rbsp_byte[ NumBytesInRBSP++ ] All b(8)  } }Here, the If(nal_unit_type==21) { . . . } statement has been added tothe NAL unit syntax described in the AVC draft.

An example of a NAL unit stream 1200 in accordance with embodiment 3 isprovided in FIG. 12, where the new NAL unit type 21 is employed. Here,use of a 3DV prefix NAL unit type is avoided, as the view_id numberingis specified in the NAL unit header parsing process. NAL unit stream1200 is an illustration of the application of embodiment 3 to the 3DVcontent example provided in FIG. 10. As discussed above, differentvariations of dependencies between 3DV views and 3DV layers and of the3DV layers used can be different in accordance with variousimplementations.

Similar to stream 1100, stream 1200 can include different sets of viewsfor different times, with views 1204, 1206 and 1208 corresponding to T0(1202) and views 1212, 1214 and 1216 corresponding to time T1 (1210).View 1204 and view 1212 (3DV View 0) correspond to base view 1002 attimes T0 and T1, respectively, in that they are associated with the sameperspective or viewpoint as base view 1002. Similarly, view 1206 andview 1214 (3DV view 2) correspond to P view 1006 at times T0 and T1,respectively, while view 1208 and view 1216 (3DV view 1) correspond to Bview 1004 at times T0 and T1, respectively. Each 3DV view is composed ofa 2D view layer and a depth layer. As for stream 1100, it should beunderstood that additional supplemental layers can be employed in otherembodiments. View 1204 is composed of a 2D view layer 1218 and a depthlayer 1220. In turn, the 2D view layer 1218 is composed of NAL units 14(1226) and 5 (1230), while the depth layer 1220 is composed of NAL units21 (1230). Further, view 1206 is composed of 2D view 1222, whichincludes NAL units 20, and a depth view 1224 composed of NAL units 21.In addition, 2D view 1236 of view 1212 is composed of NAL units 14 and1.

NAL units 1, 5, 14 and 20 have been described above with respect to FIG.11. NAL unit 21 employs a 3DV NAL unit header extension of Table 8 asopposed to an MVC NAL unit header extension of Table 7 used by NAL units14 and 20. Use of the novel 3DV NAL unit header extension enablessynchronization of 3DV layers into a 3DV content format at the codinglevel while permitting the application of new coding methods. Differentfrom NAL unit 16, the NAL unit 21 can include a payload of correspondingvideo data. More generally, the payload can include picture data, whichgenerally refers to data for a corresponding encoded picture. Thepicture data may be from any layer, such as, for example, 2D video,depth, occlusion video, occlusion depth, or transparency.

It should also be noted that similar to FIG. 11, the arrows of FIG. 12indicate the transmission order of NAL units. Moreover, NAL units 1, 5,20, and 21 in FIG. 12 are truncated in the same way in which NAL units1, 5 and 20 of FIG. 11 are truncated. Further, embodiment 3 is MVCcompatible in that 2D view layers can be decoded by a conventionaldecoder and combined in accordance with MVC to permit the generation anddisplay of 3D content.

Turning now to FIGS. 13 and 14, methods 1300 and 1400 for decoding andencoding, respectively, a 3DV content stream in accordance withembodiment 3 are illustrated. It should be understood that method 1300can be performed by and implemented in decoder 400 of FIG. 4, whilemethod 1400 can be performed by and implemented in encoder 300 of FIG.3. Both methods 1300 and 1400 employ the syntax provided above in Table9.

Method 1300 can begin at step 1302 in which the decoder 400 can read thenal_ref_idc, described above in Table 9 and also in the AVC draft, of areceived NAL unit.

At step 1304, the decoder 400 can read the NAL unit type.

At step 1306, the decoder 400 can determine whether the NAL unit type is14. If the NAL unit type is 14, then the decoder 400 can proceed to step1308 and parse the remaining portion of the currently processed NAL unitto obtain the MVC view ID. In this particular implementation ofembodiment 3, the 3DV view ID and the 3DV layer ID is indicated by theMVC view ID, for example, as described above with respect to Embodiment2.

Thus, at step 1310, the decoder 400 can obtain the 3DV view ID and the3DV layer ID from the MVC view ID, as discussed above, for example, withrespect to embodiment 2.

At step 1312, the decoder 400 can read and parse the next NAL unitreceived. The next NAL unit should be either of type 1 or of type 15.Thus, if the decoder determines that the next NAL unit is not of type 1or of type 15, then an error has occurred.

At step 1314, the decoder 400 can decode the current slice data of thecurrently processed NAL unit.

At step 1316, the decoder 400 can determine whether the processed NALunit corresponds to the end of the current frame. If the processed NALunit does not correspond to the end of the current frame, then steps1312-1316 may be repeated by the decoder 400.

After the end of the current frame is reached, then the method mayproceed to step 1318, in which the decoder 400 may send the decodedframe with its 3DV view ID and its 3DV layer ID to its output buffer,such as, for example, 3DV Reference/Output Buffer 414, which in turn,may configure the frame in a 3DV format for display, as discussed above.

At step 1320, the decoder 400 may determine whether the end of thebitstream or sequence has been reached. If the end of the bitstream orsequence has not been reached, then the method may proceed to step 1302and the decoder 400 may repeat method 1300. If the end of the bitstreamor sequence is reached, then method 1300 may end.

Returning to step 1306, if decoder 400 determines that the NAL unit typeof the currently processed NAL unit is not of type 14, then the methodmay proceed to step 1322, in which the decoder 400 may determine whetherthe NAL unit type of the currently process NAL unit is 20. If thecurrently processed NAL unit is of type 20, then the method may proceedto step 1324, in which decoder 400 can parse the remaining portion ofthe currently processed NAL unit to obtain the MVC view ID. In thisparticular implementation of embodiment 3, the 3DV view ID and the 3DVlayer ID is indicated by the MVC view ID, for example, as describedabove with respect to embodiment 2.

Accordingly, at step 1326, the decoder 400 can obtain the 3DV view IDand the 3DV layer ID from the MVC view ID, as discussed above, forexample, with respect to embodiment 2.

At step 1328, the decoder 400 can decode the current slice data of thecurrently processed NAL unit.

At step 1330, the decoder 400 can determine whether the processed NALunit corresponds to the end of the current frame. If the processed NALunit does not correspond to the end of the current frame, then themethod may proceed to step 1332, in which the decoder 400 can read andparse the next NAL unit received. The next NAL unit should be of type20. Thus, if the decoder determines that the next NAL unit is not oftype 20, then an error has occurred. Thereafter, steps 1326-1330 may berepeated by the decoder 400.

If, at step 1330, the decoder 400 determines that the end of the currentframe is reached, then the method may proceed to step 1318, in which thedecoder 400 may send the decoded frame with its 3DV view ID and its 3DVlayer ID to its output buffer, as discussed above. Thereafter, themethod may proceed to step 1320 and may be repeated or terminated, asdiscussed above.

Returning to step 1322, if the decoder 400 determines that the currentlyprocessed NAL unit is not of type 20, then the method may proceed tostep 1334, in which the decoder determines whether the NAL unitcurrently processed is of type 21. If the NAL unit currently processedis of type 21, then the method may proceed to step 1336 in which thedecoder 400 may parse the remaining portion of the currently processedNAL unit and obtain the 3DV view ID and the 3DV layer ID provided by the3DV NAL unit header extension.

At step 1338, the decoder 400 can decode the current slice data of thecurrently processed NAL unit.

At step 1340, the decoder 400 can determine whether the processed NALunit corresponds to the end of the current frame. If the processed NALunit does not correspond to the end of the current frame, then themethod may proceed to step 1342, in which the decoder 400 can read andparse the next NAL unit received. The next NAL unit should be of type21. Thus, if the decoder determines that the next NAL unit is not oftype 21, then an error has occurred. Thereafter, steps 1338-1340 may berepeated by the decoder 400.

If, at step 1340, the decoder 400 determines that the end of the currentframe is reached, then the method may proceed to step 1318, in which thedecoder 400 may send the decoded frame with its 3DV view ID and its 3DVlayer ID to its output buffer, as discussed above. Thereafter, themethod may proceed to step 1320 and may be repeated or terminated, asdiscussed above.

Returning to step 1334, if the decoder 400, at step 1334, determinesthat the currently processed NAL unit is not of type 21, then the methodmay proceed to step 1344 in which the remaining portion of the currentlyprocessed NAL unit is parsed, which may be intended for the sequenceparameter set (SPS), the picture parameter set (PPS) or for otherpurposes. Thereafter, the method may proceed to step 1320 and may berepeated or terminated, as discussed above.

Referring again to FIG. 14, method 1400 for encoding a 3DV contentstream in accordance with embodiment 3 may begin at step 1402, in whichthe encoder 300 may read its configuration profile.

At step 1404, the encoder 300 may write SPS and/or PPS NAL units.

At step 1406, the encoder 300 may read the next frame to encode.

At step 1408, the encoder 300 may determine whether the currentlyprocessed frame is to be an AVC compatible view. If the currentlyprocessed frame is to be an AVC compatible view, then the method mayproceed to step 1410, in which the encoder 300 can encode the next sliceof the current frame.

At step 1412, if the currently processed slice of the current frame isthe first slice of the current frame, as determined by encoder 300, thenthe encoder 300 may write an MVC prefix NAL unit with a NAL unit typeof, for example, 14.

At step 1414, the encoder 300 can encapsulate the current slice into aNAL unit, such as for example, a NAL unit of type 1 or 5.

At step 1416, the encoder 300 can write the NAL unit in which thecurrent slice is encapsulated at step 1414.

At step 1418, the encoder 300 can determine whether it has reached theend of the current frame. If the encoder has not reached the end of thecurrent frame, then the method may proceed to step 1410 and the encoder300 may repeat steps 1410-1418. If the encoder has reached the end ofthe current frame, then the method may proceed to step 1420, in whichthe encoder 300 can determine whether all the frames have been processedfor a sequence or bitstream. If all of the frames have been processed,then the method may end. Otherwise, the method may proceed to step 1406and the encoder may repeat steps 1406 and 1408.

Returning to step 1408, introduced above, if the encoder 300 determinesthat the currently processed frame need not be an AVC compatible view,then the method may proceed to step 1422 in which the encoder 300 maydetermine whether the currently processed frame is to be an MVCcompatible view. If the currently processed frame is to be an MVCcompatible view, then the method may proceed to step 1424 in which theencoder 300 may encode the next slice of the currently processed frame.

At step 1426, the encoder may encapsulate the current slice into a NALunit with a NAL unit type of, for example, 20.

At step 1428, the encoder 300 can write the NAL unit in which thecurrent slice is encapsulated at step 1426.

At step 1430, the encoder 300 can determine whether it has reached theend of the current frame. If the encoder has not reached the end of thecurrent frame, then the method may proceed to step 1424 and the encoder300 may repeat steps 1424-1430. If the encoder has reached the end ofthe current frame, then the method may proceed to step 1420, in whichthe encoder 300 can determine whether all the frames have been processedfor a sequence or bitstream. If all of the frames have been processed,then the method may end. Otherwise, the method may proceed to step 1406and the encoder may repeat steps 1406 and 1408.

Returning to step 1422, if the encoder 300 determines that the currentlyprocessed frame need not be an MVC compatible view, then the method mayproceed to step 1432, in which encoder 300 may encode the next slice ofthe current frame.

At step 1434, the encoder may encapsulate the current slice into a NALunit with a NAL unit type of, for example, 21.

At step 1436, the encoder 300 can write the NAL unit in which thecurrent slice is encapsulated at step 1434.

At step 1440, the encoder 300 can determine whether it has reached theend of the current frame. If the encoder has not reached the end of thecurrent frame, then the method may proceed to step 1432 and the encoder300 may repeat steps 1432-1440. If the encoder has reached the end ofthe current frame, then the method may proceed to step 1420, in whichthe encoder 300 can determine whether all the frames have been processedfor a sequence or bitstream. If all of the frames have been processed,then the method may end. Otherwise, the method may proceed to step 1406and the encoder may repeat steps 1406 and 1408.

It should be understood that the encoding steps 1410, 1424 and 1432 anddecoding steps 1314, 1328 and 1338 can be performed in accordance with avariety of different coding methods and standards that permitconformance with the structures and features of embodiments discussedabove with respect to, for example, FIGS. 10 and 12.

Moreover, with the introduction of new NAL unit type 21 for 3DV layers,special coding techniques can be defined for different 3DV layers whichutilize their different characteristics. For example, the decoding of a2D view may depend on the decoding of its depth map when the depth mapis used to find a prediction block in a reference picture. Further,other such dependencies can be employed, as discussed above.

It should also be noted that with the novel NAL unit type 21, a 3DVview/layer can be coded with 3dv_slice_layer_extension_rbsp( ) as inTable 10, where 3dv_slice_header( ) and 3dv_slice_data( ) may include amodified slice_header( ) and slice_data( ).

TABLE 10 3DV slice layer 3dv_slice_layer_extension_rbsp( ) { CDescriptor  3dv_slice_header( ) 2  3dv_slice_data( ) 2|3|4 rbsp_slice_trailing_bits( ) 2 }

It should also be understood that, although embodiments 1-3 have beendescribed separately, one or more of the embodiments can be combined ina variety of ways, as understood by those of ordinary skill in therelevant technical art in view of the teachings provided herein. Forexample, different slices of the same frame can be encoded in differentways. For example, certain slices of a frame can be encoded in an MVCcompatible way according to embodiments 1 and/or 2, while other slicescan be encoded using a non-MVC encoding mode in accordance withembodiment 3. In addition, MVC according to embodiments 1 and/or 2 canbe employed for encoding certain layers of a 3DV view, such as, forexample, a 2D view, while non-MVC modes according to embodiment 3 may beapplied to encode other layers of the 3DV view, such as, for example, anocclusion view. Here, NAL units 16 with NAL units 1 and/or 5 may beapplied to some layers of one or more 3DV views while NAL units 21 maybe applied to other layers of one or more 3DV views.

Embodiment 4 Reference Picture List Construction

As indicated above, embodiments may be directed to a reference picturelist construction process. In the embodiment discussed herein below,each picture has its own reference picture list. However, otherimplementations may provide reference picture lists that are specific to(and used for) multiple pictures. For example, a reference picture listmay be allocated to an entire sequence of pictures in time, or an entireset of pictures across multiple views at a given point in time, or asubset of a picture. For example, a subset of a picture may be composedof a slice or a single macroblock or a sub-macroblock. The inputs ofthis reference picture list construction process are the inter_view_flagfrom the NAL unit header and view dependency information decoded fromthe sequence parameter set. It should be understood that both encoder300 of FIG. 3 and decoder 400 of FIG. 4 can be configured to constructthe reference picture list to encode and decode a bitstream,respectively, by employing the teachings described herein below.

In a first phase in the process, the temporal reference pictures andinter-view reference pictures may be inserted into an initial referencepicture list, RefPicListX (with X being 0 or 1), as may be done, forexample, in AVC or MVC systems. The RefPicListX as defined in the AVCdraft can serve as an example initial reference picture list. Forexample, RefPicList0, with X being 0, can be used for the encoding ordecoding of any type of predictively coded picture, while RefPicList1,with X being 1, can be used for the encoding of decoding ofbi-directionally coded pictures or B pictures. Thus, a B picture mayhave two reference picture lists, RefPicList0 and RefPicList1, whileother types of predictively coded pictures may have only one referencepicture list, RefPicList0. Further, it should be noted that, here, atemporal reference corresponds to a reference to a picture that differsin time with the corresponding picture to which the reference list isallocated. For example, with reference to FIG. 11, a temporal referencemay correspond to a reference to view 1104 for the encoding/decoding ofview 1112. In turn, an inter-view reference may correspond to areference to view 1104 for the encoding/decoding of view 1106. Byinserting the temporal and inter-view reference pictures in a referencepicture list, existing temporal and inter-view prediction techniques(for example, from AVC and/or MVC) are supported. As is known, AVCsystems would include temporal reference pictures in the referencepicture list, and MVC systems would further include inter-view referencepictures in the reference picture list.

A second phase in the process may comprise adding inter-layer referencepictures, which may be defined for each layer independently. Oneinter-layer prediction structure 1500 for embodiment 4 is provided inFIG. 15. The arrows in structure 1500 indicate the prediction direction.For example, the 2D video (view) layer 1502 (arrow from) of a particularview is used as reference for encoding the depth layer 1504 (arrow to)of the view. Accordingly, the inter-layer prediction structure may beused to determine which picture(s) may be used as a reference and,therefore, which picture(s) should be included in a reference picturelist. In the structure 1500, the 2D video layer is also used as areference for both the occlusion video layer 1506 and for thetransparency layer 1510. In addition, the depth layer 1504 is used as areference for the occlusion depth layer 1508.

As depicted in FIG. 15, for the inter-layer prediction structure 1500,each 3DV layer has at most one inter-layer reference. To encode a givenlayer, a layer with similar characteristics is used as reference. Forexample, with reference again to FIG. 2, the occlusion video layer 206includes the background of the 2D video layer 202 while the occlusiondepth layer 208 includes the background of the depth layer 204. Thus, tobetter exploit redundancy across layers, implementations may use the 2Dvideo layer of a view as a reference for an occlusion layer of the viewand may use a depth layer of the view as a reference for an occlusiondepth layer of the view. Other implementations may permit multipleinter-layer references for a given 3DV layer.

For the 2D video layer picture, nothing need be done in the secondphase, as inter-layer references need not be used in implementations forthe 2D video layer picture. Other embodiments may indeed provide forinter-layer references for the 2D video layer. For example, theocclusion layer of a given view may be used as a reference for the 2Dvideo layer of the reference. An advantage of avoiding the use ofinter-layer references for the 2D view layers is that all the 2D viewlayers may be decoded by a conventional MVC decoder. It should be notedthat in other implementations, a warped picture such as, for example, asynthesized virtual reference picture, can be appended to the referencelist. With regard to the warped picture reference position in thereference list, the warped picture reference can be inserted at thebeginning of the initial reference list with high synthesis quality orat the end of the reference list with moderate synthesis quality. Use ofthe warped picture in this way can improve coding efficiency.

Returning to FIG. 15, for the depth layer picture 1504, the 2D videolayer picture 1502 (shown as the reference for the depth layer in FIG.15) may be appended to the end of RefPicListX in the second phase. Invarious implementations, the 2D video picture reference is appended atthe end of the reference list, rather than at the beginning of thereference list, because it is expected to have the least redundancy(compared to any of the first phase's temporal and inter-viewreferences) and is expected to be the least likely to be used as areference. Thus, here, the inter-layer reference is provided after anytemporal and inter-view references in the reference picture list.

For the occlusion video layer picture 1506, the 2D video layer picture1502 can be appended to the beginning of RefPicListX in the secondphase. The 2D video picture can be appended at the beginning(prepended), before any temporal and inter-view references in thereference picture list, rather than at the end or in the middle, becausethe 2D video picture is expected to have the most redundancy of theavailable reference pictures and to be the most likely to be used as areference.

For the occlusion depth layer picture 1508, the depth picture 1504 canbe appended to the beginning of RefPicListX in the second phase, beforeany temporal and inter-view references in the reference picture list,due to a high level of redundancy expected (compared to any of the firstphase's temporal and inter-view references) between the occlusion depthlayer and the depth layer.

For the transparency layer picture 1510, the 2D video layer picture 1502can be appended to the end of RefPicListX, after any temporal andinter-view references in the reference picture list, in the second phasedue to a low level of redundancy (compared to any of the first phase'stemporal and inter-view references) expected between the transparencylayer and the 2D video layer.

More generally, inter-layer references for a picture can be insertedinto the reference picture list for that picture at a positiondetermined by how frequently that reference is used. For implementationsin which a priority is assigned to each reference, the priority may beassigned based on how frequently that reference is used. As an example,one implementation encodes a picture by macroblocks, and each macroblockmay or may not use a given reference from the reference picture list.For each macroblock of this implementation, a rate-distortionoptimization is performed among various coding options, includingdifferent coding modes and different references. Thus, a giveninter-layer reference might only be used in coding a subset of themacroblocks of the picture. The priority assigned to the giveninter-layer reference may be determined based upon how many macroblocksuse the inter-layer reference, as compared to how many macroblocks usethe other references available in the reference picture list.

With reference now to FIGS. 16 and 17, methods 1600 and 1700 forconstructing a reference picture list for an encoding and decodingprocess, respectively, are illustrated. The method 1600 for constructinga reference picture list for an encoding process in accordance with oneimplementation of embodiment 4 may be performed by encoder 300 of FIG.3. For example, the 3DV Reference Buffer 316 may be configured toimplement method 1600.

Method 1600 may begin at step 1602, in which the encoder 300 mayinitialize the reference picture list, RefPicListX. As noted above, theRefPicListX may be initialized in accordance with the AVC draft, with Xbeing 0 or 1. For example, as indicated above, temporal and/orinter-view reference pictures may be inserted into the initial referencepicture list.

At step 1604, the encoder 300 can determine whether the referencepicture list is for a 2D video layer picture. If the reference picturelist is for a 2D video layer picture, then the method may proceed tostep 1622, at which the encoder 300 may continue encoding the slicecurrently being processed. Thereafter, the method may end or the methodmay repeat to construct a reference picture list for another 3DV layerpicture. Alternatively, if the 3DV layer picture is a B picture, themethod may repeat for the same 3DV layer picture to constructRefPicList1.

If, at step 1604, the encoder 300 determines that the reference picturelist is not for a 2D video layer picture, the method may proceed to step1606, in which the encoder 300 may determine whether the referencepicture list is for a depth layer picture. If the reference picture listis for a depth layer picture, then the method may proceed to step 1608,in which the 2D video layer picture from the same 3D view as the depthlayer picture is appended to the end of the reference picture list.Thereafter, the method may proceed to step 1622, at which the encoder300 may continue encoding the slice currently being processed. Themethod may then end or may repeat to construct a reference picture listfor another 3DV layer picture. Alternatively, if the 3DV layer pictureis a B picture, the method may repeat for the same 3DV layer picture toconstruct RefPicList1.

If, at step 1606, the encoder 300 determines that the reference picturelist is not for a depth layer picture, the method may proceed to step1610, in which the encoder 300 may determine whether the referencepicture list is for an occlusion video layer picture. If the referencepicture list is for an occlusion video layer picture, then the methodmay proceed to step 1612, in which the 2D video layer picture from thesame 3D view as the occlusion video layer picture is appended to thebeginning of the reference picture list. Thereafter, the method mayproceed to step 1622, at which the encoder 300 may continue encoding theslice currently being processed. The method may then end or may repeatto construct a reference picture list for another 3DV layer picture.Alternatively, if the 3DV layer picture is a B picture, the method mayrepeat for the same 3DV layer picture to construct RefPicList1.

If, at step 1610, the encoder 300 determines that the reference picturelist is not for an occlusion video layer picture, the method may proceedto step 1614, in which the encoder 300 may determine whether thereference picture list is for an occlusion depth layer picture. If thereference picture list is for an occlusion depth layer picture, then themethod may proceed to step 1616, in which the depth layer picture fromthe same 3D view as the occlusion depth layer picture is appended to thebeginning of the reference picture list. Thereafter, the method mayproceed to step 1622, at which the encoder 300 may continue encoding theslice currently being processed. The method may then end or may repeatto construct a reference picture list for another 3DV layer picture.Alternatively, if the 3DV layer picture is a B picture, the method mayrepeat for the same 3DV layer picture to construct RefPicList1.

If, at step 1614, the encoder 300 determines that the reference picturelist is not for an occlusion depth layer picture, the method may proceedto step 1618, in which the encoder 300 may determine whether thereference picture list is for a transparency layer picture. If thereference picture list is for a transparency layer picture, then themethod may proceed to step 1620, in which the 2D video layer picturefrom the same 3D view as the transparency layer picture is appended tothe end of the reference picture list. Thereafter, the method mayproceed to step 1622, at which the encoder 300 may continue encoding theslice currently being processed. The method may then end or may repeatto construct a reference picture list for another 3DV layer picture.Alternatively, if the 3DV layer picture is a B picture, the method mayrepeat for the same 3DV layer picture to construct RefPicList1.Similarly, if at step 1618, the encoder 300 determines that the layer isnot a transparency layer picture, then the method may proceed to step1622, at which the encoder 300 may continue encoding the slice currentlybeing processed. The method may then end or may repeat to construct areference picture list for another 3DV layer picture. Alternatively, ifthe 3DV layer picture is a B picture, the method may repeat for the same3DV layer picture to construct RefPicList1.

Turning now to method 1700 of FIG. 17, the method 1700 for constructinga reference picture list for a decoding process in accordance with oneimplementation of embodiment 4 may be performed by decoder 400 of FIG.4. For example, the 3DV reference/output buffer 414 may be configured toperform method 1700.

Method 1700 may begin at step 1702, in which the decoder 400 may parse areceived NAL unit and slice header to extract the 3DV layer identifier.For example, the NAL unit may be the 3DV prefix unit 16 discussed abovewith regard to embodiment 1, the NAL prefix unit 14 and/or the NAL unit20 of embodiment 2, and/or the NAL unit 21 of embodiment 3. Further, asindicated above, other information that may be extracted by decoder 400from a bitstream including 3DV content received by the decoder 400 mayinclude an inter_view_flag from a NAL unit header and view dependencyinformation decoded from the sequence parameter set. Thereafter, thereference picture list, RefPicListX, can be initialized. As noted above,the RefPicListX may be initialized in accordance with the AVC draft,with X being 0 or 1. For example, as indicated above, theinter_view_flag from NAL unit header and view dependency informationdecoded from the sequence parameter set may be employed to initializethe RefPicListX. In turn, temporal and/or inter-view reference picturesmay be inserted into the initial reference picture list.

The remaining steps of method 1700 may be performed by the decoder 400in the same manner discussed above with respect to method 1600, exceptthat step 1622 is replaced with step 1722. For example, steps 1704-1720may be performed by the decoder 400 in the same manner as steps1604-1620 are performed by the encoder 300. However, at step 1722, thedecoder continues to decode the currently processed slice as opposed toencoding the currently processed slice.

It should be understood that that inter-layer prediction structures withinter-layer dependencies other than that described above with respect toFIG. 15 can be easily conceived by one of ordinary skill in the artusing the teachings provided above with regard to embodiment 4.

Accordingly, embodiment 4 can support different types of inter-layerprediction. Further, embodiment 4 adapts a reference picture list to aninter-layer prediction structure such as, for example, the structuredescribed above with respect to FIG. 15. Consequently, embodiment 4provides a reference picture list that is based on an inter-layerprediction structure of a system, while at the same time permits aconventional MVC decoder to extract 3DV content and format the contentfor display.

It should be noted that reference pictures can be organized so that theyare compatible with an AVC system. For example, inter-layer andinter-view reference pictures can be multiplexed as temporally distinctpictures.

Embodiment 5 Novel NAL Unit Type for Subset SPS 3DV

As indicated above, in at least one embodiment, the SPS can be extendedsuch that new sequence parameters for a 3DV format can be signaled. Theextended SPS for 3DV is referred herein below as the “subset SPS 3DV”.In embodiment 5, a novel NAL unit type for the subset SPS 3DV can beemployed. In embodiments 6 and 7, discussed below, how the subset SPS3DV may be composed is described. It should be understood that theproposed parameters are not limited to be within SPS, but also canappear in a NAL unit header, a picture parameter set (PPS), supplementalenhancement information (SEI), a slice header, and any other high levelsyntax element. Embodiments may also use low-level syntax andout-of-band information.

Here, in embodiment 5, a novel NAL unit type can be used to indicate thesubset SPS 3DV. The NAL unit type number in this embodiment may be anyone of the values not allocated in Table 3 above, which, as statedabove, has been transcribed from the AVC draft. Moreover, the novel NALunit type number allocated for the VCL NAL units for 3DV layers shouldalso be selected in a manner different from the novel NAL unit typesdescribed above with regard to embodiments 1 and 3. As a result, 17 isselected as the NAL unit type number for subset SPS 3DV, which isrepresented as subset_seq_parameter_set_(—)3dv_rbsp( ) in Table 11,below. Of course, other NAL unit type numbers may be selected. Ifembodiments are not to be combined, then NAL unit types 16 or 21 couldalso be used instead of 17. The rows for nal_unit_type 17 andnal_unit_type 18 . . . 20 are newly added with respect to Table 2 above.

TABLE 11 NAL unit type codes, syntax element categories, and NAL unittype classes Annex G Annex A and NAL Annex H unit NAL unit Content ofNAL unit and RBSP type type nal_unit_type syntax structure C class class 0 . . . 16 As defined in Table 2 17 subset_seq_parameter_set_3dv_rbsp() non-VCL non-VCL 18 . . . 20 Reserved 21 Coded 3DV slice extension 2,non-VCL VCL 3dv_slice_layer_extension_rbsp( ) 3, 4 22 . . . 23 Reservednon-VCL non-VCL 24 . . . 31 As defined in Table 2 non-VCL non-VCL

The novel NAL unit type can permit an MVC decoder or a 3DV decoder todetermine whether to discard or to parse the content within the subsetSPS 3DV. Because the type 17 is reserved under MVC, an MVC decoder canchoose to ignore or discard the data in this NAL unit. A 3DV decoder,however, can parse the data in the unit, which permits the 3DV decoderto decode the 3DV supplemental layers.

For a smart network device, for example, a router, which can recognizethe novel NAL unit type, the network device may select to discard thesubset SPS 3DV should the network provider determine that the 3DVsupplemental layers should not be transmitted under particularcircumstances. Alternatively or additionally, the content in the subsetSPS 3DV can be parsed and utilized to adapt the streaming to the networkbandwidth available. For example, with the knowledge of the 3DV layerprediction structure, the 3DV layers which are not used as referencesmay be discarded by the network device (for example, either a streamingserver or a router) when the network suffers from bursty traffic.

A bitstream extractor, also referred to as a stream server, may also beused to extract various portions of a 3DV stream. The above routerparsed a bitstream and made decisions about whether or not to forward(transmit) various 3DV layers. A bitstream extractor may also parse thebitstream, and make forwarding decisions based on priority, but may alsotailor the extracted bitstream (also called a sub-bitstream) to adownstream device. For example, the bitstream extractor may extract only2D video and depth layers, because the downstream receiver does not useocclusion or transparency layers. Further yet, the bitstream extractormay extract only the layers corresponding to the first two views thatare in the bitstream, because the downstream receiver does not use morethan two views. Additionally, however, the bitstream extractor may becapable of analyzing the 3DV SPS, as well as any MVC SPS, or otherdependency information, to determine if the 2D video or depth layers useany of the occlusion or transparency layers as inter-layer references,and to determine if the first two views use any of the other views asinter-view references. If other layers or views are needed for properdecoding of the desired 3DV layers, which are the 2D video and depthlayers for the first two views, then the bitstream extractor will alsoextract those layers and/or views.

Note that priority information for a 3DV layer and 3DV view may bedetermined by a router, or bitstream extractor. However, such priorityinformation may also be provided in the bitstream, for example, by beingplaced in the NAL unit header. Such priority information may include,for example, temporal level ID, priority ID, view ID, as well as apriority ID related to 3DV information.

With reference now to FIGS. 18 and 19, methods 1800 and 1900 forencoding and decoding, respectively, NAL units for subset SPS 3DVinformation in accordance with implementations of embodiment 5 areillustrated. Methods 1800 and 1900 can be performed, for example, by the3DV reference buffer 316 of encoder 300 and by the 3DV reference buffer414 of the decoder 400, respectively.

Method 1800 may begin, for example, at step 1802, in which the encoder300 may set a NAL unit type for a NAL unit to be 17. At step 1804, theencoder 300 may write the NAL unit header. Thereafter, at step 1806,encoder 300 can compose and write the SPS. For example, the SPS maycorrespond to subset_sequence_parameter_set_(—)3dv_rbsp( ) and may becomposed and written as discussed below with respect to embodiments 6and 7.

Method 1900 may begin, for example, at step 1902, in which the decoder400 may receive a NAL unit and read the NAL unit header. The NAL unitmay correspond to the NAL unit encoded in method 1800. At step 1904, thedecoder 400 may extract the NAL unit type. If the NAL unit type is setto 17, then the encoder can read and parse the SPS. The SPS may, forexample, correspond to subset_sequence_parameter_set_(—)3dv_rbsp( ) andmay be read and parsed as discussed below with respect to embodiments 6and 7.

Embodiment 6 Extension of SPS to Signal Parameters for 3DV Applications

As discussed above with regard to embodiments 1-4, 3DV supplementallayers may be employed to support enhanced 3D rendering capability, andthus the 3DV layer identification number (3dv_layer_id) can be signaledin the SPS. Further, as discussed above, in order to remove inter-layerredundancy, inter-layer coding can be utilized and inter-layer picturescan be added into the reference picture list to facilitate inter-layercoding. Thus, to permit the decoder to determine how to decode pictureswith inter-layer references, an encoder may specify the inter-layerprediction structure in the SPS. Such an inter-layer predictionstructure may, for example, correspond to structure 1500 discussed abovewith regard to FIG. 15.

Prior to discussing SPS construction in detail, it should be noted thatin accordance with various implementations, a novel profile may beemployed for a bitstream that supports 3DV content. ITU-T, “AdvancedVideo Coding for Generic audiovisual Services—Recommendation ITU-TH.264”, March 2009, hereinafter referred to as “updated AVC draft,”provides a discussion of profiles and is incorporated herein byreference. In one or more implementations, the profile_idc can be set to218. The updated AVC draft describes other existing profiles in AVC/MVC.

Table 12, provided below, details the process undergone for the functionsubset_sequence_parameter_set_(—)3dv_rbsp( ) mentioned above with regardto embodiment 5. In particular, Table 12, at the statement elseif(profile_idc==218) { . . . }, illustrates one high levelimplementation of subset SPS 3DV in accordance with embodiment 6. Thedetailed signaling can be implemented in the function ofseq_parameter_set_(—)3dv_extension( ) as shown, for example, in Table 13below. Profile_idc of 218 represents a new profile for the MVC standard,and is a 3DV profile.

TABLE 12 subset_seq_parameter_set_3dv_rbsp( )subset_seq_parameter_set_3dv_rbsp( ) { C Descriptor seq_parameter_set_data( ) 0  if( profile_idc = = 83 || profile_idc = =86 ) {   seq_parameter_set_svc_extension( ) /* specified 0 in Annex Gupdated AVC draft */   svc_vui_parameters_present_flag 0 u(1)   if(svc_vui_parameters_present_flag = = 1 )    svc_vui_parameters_extension() /* specified 0 in Annex G of updated AVC draft */  } else if(profile_idc = = 118) {   bit_equal_to_one /* equal to 1 */ 0 f(1)  seq_parameter_set_mvc_extension( ) /* specified 0 in Annex H ofupdated AVC draft */   mvc_vui_parameters_present_flag 0 u(1)   if(mvc_vui_parameters_present_flag = = 1 )    mvc_vui_parameters_extension() /* specified 0 in Annex H of updated AVC draft*/  } else if(profile_idc = = 218 ) {   bit_equal_to_one /* equal to 1 */ 0 f(1)  seq_parameter_set_3dv_extension( ) /* 0 specified in Table 13 or 14 */ }  Additional_extension2_flag 0 u(1)  if( additional_extension2_flag == 1 )   while( more_rbsp_data( ) )    additional_extension2_data_flag 0u(1)  rbsp_trailing_bits( ) 0 }

FIGS. 20 and 21 illustrate a high level flow diagram for methods forencoding 2000 and decoding 2100, respectively, an SPS in accordance withvarious implementations of embodiment 6. Methods 2000 and 2100 encodeand decode, respectively, SPS in the form given by, for example, Table12. Table 12 could be used for example, with NAL unit type 17. It shouldbe noted that encoder 300 of FIG. 3 can be configured to perform method2000 and decoder 400 of FIG. 4 can be configured to perform method 2100.

Method 2000 can begin at step 2002, in which the encoder 300 may set theprofile_idc. As indicated above, the profile_idc may, for example, beset to 218 for subset SPS 3DV.

At step 2004, the encoder 300 may write sequence parameter set data. Forexample, such data may correspond to any SPS data described in theupdated AVC draft with respect to the seq_parameter_set_data( ) syntaxstructure.

At step 2006, the encoder 300 may determine whether the profile_idc isset to 83 or 86. If the profile_idc is set to 83 or 86, then the methodmay proceed to step 2008, at which the encoder 300 may write theseq_parameter_set_svc_extension( ) set and write thesvc_vui_parameters_present_flag, as discussed in the updated AVC draft.In addition, at step 2008, if the svc_vui_parameters_present_flag is setto 1, then the encoder 300 may write the svc_vui_parameter_extension( )as discussed in the updated AVC draft. Thereafter, the method mayproceed to step 2010, which is discussed in more detail below.

Returning to step 2006, if the profile_idc is not set to 83 or 86, thenthe method may proceed to step 2014, at which the encoder 300 maydetermine whether the profile_idc is set to 118. If the profile_idc isset to 118, then the method may proceed to step 2016, at which theencoder 300 may set bit_equal_to_one equal to 1, write bit_equal_to_one,write the seq_parameter_set_mvc_extension( ) and set and write themvc_vui_parameters_present_flag, as described in the updated AVC draft.If the mvc_vui_parameters_present_flag is equal to 1, then the encoder300 may write the mvc_vui_parameters_extension( ) as described in theupdated AVC draft. Thereafter, the method may proceed to step 2010,which is discussed in more detail below.

If, at step 2014, the encoder 300 determines that the profile_idc is notset to 118, then the method may proceed to step 2018, in which theencoder 300 may determine whether the profile_idc is set to 218. If theprofile_idc is not set to 218, then the method may proceed to step 2022,in which the encoder 300 can determine that the profile_idc is unknownand may output an error message.

However, if the profile_idc is set to 218, then the encoder 300 mayperform step 2020, in which the encoder 300 may set bit_equal_to_oneequal to 1 and write bit_equal_to_one. As noted above, bit_equal_to_oneis described in the updated AVC draft. At step 2020, the encoder 300 mayfurther write the seq_parameter_set_(—)3dv_extension( ) which isdescribed in more detail below with respect to Tables 13 and 14 andFIGS. 22-25. As discussed herein below, theseq_parameter_set_(—)3dv_extension( ) can indicate or convey inter-layerdependencies to a decoder to permit the decoder to determine appropriatepredictive references for pictures during their decoding. Thereafter,the method may proceed to step 2010. At step 2010, the encoder 300 mayset the additional_extension2flag and, if the additional_extension2_flagis set to 1, then the encoder 300 may write alladditional_extension2_data_flags, as discussed in the updated AVC draft.At step 2012, the encoder 300 may write rbsp_trailing_bits( ) asdescribed in the updated AVC draft and thereafter the method may end.

Turning now to FIG. 21, illustrating a method 2100 for decoding an SPS,that may, for example, have been generated in accordance with method2000, the method 2100 may begin at step 2102 in which the decoder 400may decode the sequence parameter set data, seq_parameter_set_data( )from a received bitstream and may set the profile_idc, as discussed inthe updated AVC draft. At step 2104, the decoder 400 may determinewhether the profile_idc is set to 83 or 86. If the profile_idc is set to83 or 86, then the method may proceed to step 2106, at which the decoder400 may decode the seq_parameter_set_svc_extension( ) and decode thesvc_vui_parameters_present_flag, as discussed in the updated AVC draft.In addition, at step 2106, if the svc_vui_parameters_present_flag is setto 1, then the decoder 400 may decode svc_vui_parameter_extension( ) asdiscussed in the updated AVC draft. Thereafter, the method may proceedto step 2108, which is discussed in more detail below.

Returning to step 2104, if the profile_idc is not set to 83 or 86, thenthe method may proceed to step 2112, at which the decoder 400 maydetermine whether the profile_idc is set to 118. If the profile_idc isset to 118, then the method may proceed to step 2114, at which thedecoder 400 may decode bit_equal_to_one, decode theseq_parameter_set_mvc_extension( ) and decode themvc_vui_parameters_present_flag, as described in the updated AVC draft.Additionally, if the mvc_vui_parameters_present_flag is et to 1, thenthe decoder 400 may decode the mvc_vui_parameters_extension( ) asdescribed in the updated AVC draft. Thereafter, the method may proceedto step 2108, which is discussed in more detail below.

If, at step 2112, the decoder 400 determines that the profile_idc is notset to 118, then the method may proceed to step 2116, in which thedecoder 400 may determine whether the profile_idc is set to 218. If theprofile_idc is not set to 218, then the method may proceed to step 2120,in which the decoder 400 can determine that an unknown profile_idc hasbeen read and may output an error message.

However, if the profile_idc is set to 218, then the decoder 400 mayperform step 2118, in which the decoder 400 may decode bit_equal_to_oneand may further decode the seq_parameter_set_(—)3dv_extension( ) whichis described in more detail below with respect to Tables 13 and 14 andFIGS. 22-25. Thereafter, the method may proceed to step 2108.

At step 2108, the decoder 400 may decode the additional_extension2flagand, if the additional_extension2flag is set to 1, then the decoder 400may decode all additional_extension2_data_flags, as discussed in theupdated AVC draft. At step 2110, the decoder 400 may decoderbsp_trailing_bits( ) as described in the updated AVC draft, andthereafter the method may end.

As mentioned above, Table 13 shows one implementation ofseq_parameter_set_(—)3dv_extension( ) where the 3dv_layer_id and theinter-layer prediction structure are signaled explicitly. Such animplementation provides a great deal of flexibility because differentordering of the 3DV layers and different inter-layer predictionstructures can be specified.

TABLE 13 One implementation of seq_parameter_set_3dv_extension( )seq_parameter_set_3dv_extension( ) { C Descriptor seq_parameter_set_mvc_extension( )  num_3dv_layer_minus1 ue(v)  for( i= 0; i <= num_3dv_layer_minus1; i++ )   3dv_layer_id[ i ] ue(v)  for( i= 1; i <= num_3dv_layer_minus1; i++ ) {   num_3dv_layer_refs_l0[ i ]ue(v)   for( j = 0; j < num_3dv_layer_refs_l0[ i ]; j++ )   3dv_layer_ref_l0[ i ][ j ] ue(v)   num_3dv_layer_refs_l1[ i ] ue(v)  for( j = 0; j < num_3dv_layer_refs_l1[ i ]; j++ )    3dv_layer_ref_l1[i ][ j ] ue(v)  } }

The semantics of Table 13 are given as follows:

num_(—)3dv_layer_minus1 plus 1 indicates the number of 3DV layers.

3dv_layer_id[i] specifies the i^(th) 3DV layer identification number.

num_(—)3dv_layer_refs_l0[i] specifies the number of inter-layerreferences in reference picture list 0 for the 3DV layer with 3DV layeridenffication number being 3dv_layer_id[i].

3dv_layer_ref_l0[i][j] specifies the 3DV layer identification numberwhich is used as the j^(th) inter-layer reference in the referencepicture list 0 for the 3DV layer with the 3DV layer identfication numberbeing 3dv_layer_id[i].

num_(—)3dv_layer_refs_l1[i] specifies the number of inter-layerreferences in reference picture list 1 for the 3DV layer with the 3DVlayer idenffication number being 3dv_layer_id[i].

3dv_layer_ref_l1[i][j] specifies the 3DV layer identification numberwhich is used as the j^(th) inter-layer reference in reference picturelist 1 for the 3DV layer with 3DV layer identification number being3dv_layer_id[i].

To better illustrate how the seq_parameter_set_(—)3dv_extension( ) ofTable 13 can be employed in embodiment 6, reference is made to FIGS. 22and 23, illustrating methods for encoding 2200 and decoding 2300,respectively, subset SPS 3DV extension. It should be understood thatmethod 2200 may be implemented by encoder 300 while method 2300 may beimplemented by decoder 400.

Method 2200 may begin at step 2202, in which the encoder 300 may encodethe seq_parameter_set_mvc_extension( ) which is described in the updatedAVC draft.

At step 2204, the encoder 300 may set and encodenum_(—)3dv_layer_minus1. As provided above, num_(—)3dv_layer_minus1indicates the total number 3DV layers employed in a 3DV view of 3DVcontent to be encoded. For convenience in coding and decoding, thenumeric value of num_(—)3dv_layer_minus1 is one less than the actualnumber of 3DV layers.

As noted above, “i” denotes a 3DV layer id number. For example, the 3DVlayer id may correspond to the 3DV layer ids defined in Table 1 above.Here, at step 2208, the encoder 300 may set and encode the 3DV layer IDsfor each type of 3DV layer employed in the 3DV content to be encoded.Thus, the encoder 300 iteratively processes each 3DV layer id in loop2206 until the total number 3DV layers employed in a 3DV view of 3DVcontent is reached.

At loop 2210, as noted in the first line of loop 2210, the encoder 300successively processes each 3DV layer id in loop 2210 to set and encode3DV inter-layer references for each 3DV layer for each reference picturelist type, 0 and, potentially, 1. For example, at step 2212, the encoder300 may set and encode the total number of inter-layer references(num_(—)3dv_layer_refs_l0[i]) in reference picture list 0 for the 3DVlayer (denoted by ‘i’) to which the reference picture list is allocated.It should be noted that the number of inter-layer references in anyreference picture list is dependent on the inter-layer dependencystructure employed. For example, in structure 1500 of FIG. 15, each 3DVlayer has at most one inter-layer reference in a reference picture listallocated to the 3DV layer. However, other inter-layer dependency orprediction structures can be employed, such as the structure discussedherein below with respect to embodiment 7.

After the total number of inter-layer references for 3DV layer ‘i’ inreference picture list ‘0’ is set, the encoder 300 may, at step 2216,set and encode the inter-layer references for reference picture list ‘0’of 3DV layer ‘i.’ In particular, the encoder 300 can specify the 3DVlayer ids (3dv_layer_ref_l0[i][j]) of the inter-layer references inreference picture list ‘0’ of 3DV layer ‘i.’ In FIG. 22, as well asTable 13, inter-layer references in reference picture list ‘0’ of 3DVlayer ‘i’ can be denoted by ‘j,’ such that step 2216 can be iterated inloop 2214 until the total number of inter-layer references for 3DV layer‘i’ for reference picture list ‘0’ has been reached.

The encoder 300 may further be configured to provide inter-layerreferences for any reference picture list ‘1’ of 3DV layer ‘i.’ However,it should be understood that the following steps of method 2200 may beskipped should the particular 3DV layer ‘i’ not have a reference picturelist ‘1.’ If the 3DV layer 1′ has a reference picture list ‘1,’ themethod may proceed to step 2218, in which the encoder 300 may set andencode the total number of inter-layer references(num_(—)3dv_layer_refs_l1[i]) in reference picture list 1 for the 3DVlayer i to which the reference picture list ‘1’ is allocated.

After the total number of inter-layer references for 3DV layer ‘i’ inreference picture list ‘1’ is set, the encoder 300 may, at step 2222,set and encode the inter-layer references for reference picture list ‘1’of 3DV layer ‘i.’ In particular, the encoder 300 can specify the 3DVlayer ids (3dv_layer_ref_l1[i][j]) of the inter-layer references inreference picture list ‘1’ of 3DV layer ‘i.’ Similar to the discussionprovided above with regard to reference picture list ‘0’ for 3DV layer‘i,’ inter-layer references in reference picture list ‘1’ of 3DV layer‘i’ can be denoted by ‘j,’ such that step 2222 can be iterated in loop2220 until the total number of inter-layer references for 3DV layer ‘i’for reference picture list ‘1’ has been reached.

In addition, as indicated above, at loop 2210, steps 2212 and 2218 andloops 2214 and 2220 can be iterated for each layer of the 3DV layersemployed in a 3DV view of 3DV content to be encoded until all suchlayers have been processed.

Turning now to FIG. 23, a method 2300 for decoding an SPS 3DV extensionreceived in a bitstream using the seq_parameter_set_(—)3dv_extension( )is described. Method 2300 may begin at step 2302, in which the decoder400 may decode the seq_parameter_set_mvc_extension( ) which is describedin the updated AVC draft.

At step 2304, the decoder 400 may decode and obtainnum_(—)3dv_layer_minus1. As stated above, num_(—)3dv_layer_minus1indicates the total number 3DV layers employed in a 3DV view of 3DVcontent. As stated above, the numeric value of num_(—)3dv_layer_minus1is one less than the actual number of 3DV layers.

As noted above, “i” denotes a 3DV layer id number. For example, the 3DVlayer id may correspond to the 3DV layer ids defined in Table 1 above.Here, at step 2308, the decoder 400 may decode and obtain the 3DV layerIDs for each type of 3DV layer employed in the 3DV content. Thus, thedecoder 400 iteratively processes each 3DV layer id in loop 2306 untilthe total number 3DV layers employed in a 3DV view of 3DV content isreached and each 3DV layer id is obtained.

At loop 2310, as noted in the first line of loop 2310, the decoder 400successively processes each 3DV layer id in loop 2310 to decode andobtain 3DV inter-layer references for each 3DV layer for each referencepicture list type, 0 and, potentially, 1. For example, at step 2312, thedecoder 400 may decode and obtain the total number of inter-layerreferences (num_(—)3dv_layer_refs_l0[i]) in reference picture list 0 forthe 3DV layer (denoted by ‘i’) to which the reference picture list isallocated. It should be noted that the number of inter-layer referencesin any reference picture list is dependent on the inter-layer dependencystructure employed. For example, in structure 1500 of FIG. 15, each 3DVlayer has at most one inter-layer reference in a reference picture listallocated to the 3DV layer. However, other inter-layer dependency orprediction structures can be employed, such as the structure discussedherein below with respect to embodiment 7.

After the total number of inter-layer references for 3DV layer ‘i’ inreference picture list ‘0’ is obtained, the decoder 400 may, at step2316, decode and obtain the inter-layer references for reference picturelist ‘0’ of 3DV layer ‘i.’ In particular, the decoder 400 can obtain the3DV layer ids (3dv_layer_ref_l0[i][j]) of the inter-layer references inreference picture list ‘0’ of 3DV layer ‘i’. In FIG. 23, as well asTable 13, inter-layer references in reference picture list ‘0’ of 3DVlayer ‘i’ can be denoted by ‘j,’ such that step 2316 can be iterated inloop 2314 until the total number of inter-layer references for 3DV layer‘i’ for reference picture list ‘0’ has been reached.

The decoder 400 may further be configured to obtain inter-layerreferences for any reference picture list ‘1’ of 3DV layer ‘i.’ However,it should be understood that the following steps of method 2300 may beskipped should the particular 3DV layer ‘i’ not have a reference picturelist ‘1.’ If the 3DV layer ‘i’ has a reference picture list ‘1,’ themethod may proceed to step 2318, in which the decoder 400 may decode andobtain the total number of inter-layer references(num_(—)3dv_layer_refs_l1[i]) in reference picture list 1 for the 3DVlayer ‘i’ to which the reference picture list ‘1’ is allocated.

After the total number of inter-layer references for 3DV layer ‘i’ inreference picture list ‘1’ is obtained, the decoder 400 may, at step2322, decode and obtain the inter-layer references for reference picturelist ‘1’ of 3DV layer ‘i.’ In particular, the decoder 400 can specifythe 3DV layer ids (3dv_layer_ref_l1[i][j]) of the inter-layer referencesin reference picture list ‘1’ of 3DV layer ‘i.’ Similar to thediscussion provided above with regard to reference picture list ‘0’ for3DV layer ‘i,’ inter-layer references in reference picture list ‘1’ of3DV layer ‘i’ can be denoted by ‘j,’ such that step 2322 can be iteratedin loop 2320 until the total number of inter-layer references for 3DVlayer ‘i’ for reference picture list ‘1’ has been reached.

In addition, as indicated above, at loop 2310, steps 2312 and 2318 andloops 2314 and 2320 can be iterated for each layer of the 3DV layersemployed in a 3DV view of 3DV content until all such layers have beenprocessed. Thus, the decoder 400 may reconstruct the reference picturelist(s) for each 3DV layer to thereby permit the decoder 400 todetermine the inter-layer references for each 3DV layer picture receivedduring decoding of the pictures.

It should be noted that when a network device parses the information ona 3DV layer and the prediction structure, it may allocate differentpriorities during transmission for the NAL units from different 3DVlayers. Thus, when congestion occurs, some NAL units from “higher” 3DVsupplemental layers (for example, higher 3D layer ids in Table 1) may bediscarded to relieve the traffic.

Embodiment 7 Alternative Extension of SPS to Signal Parameters for 3DVApplications

In certain implementations, because the potential numbers of 3DV layersused may be limited, and, in turn, because the content in the 3DV layersmay have specific and consistent characteristics, the predictionstructure used to encode and decode the 3DV may be preconfigured andknown to both encoders and decoders. Thus, we need not signal and conveythe specific prediction or inter-layer dependency structure in anexplicit way, as for example, in Table 13 of embodiment 6. Rather, theinter-layer prediction structure may be known to both the encoder anddecoder in embodiment 7, thereby simplifying the conveyance of theextended SPS for 3DV to the decoder.

To provide a simple example, the following 3DV layers defined above areemployed: 2D video layer, depth layer, occlusion video layer, occlusiondepth layer, and transparency layer.

Below, an example of a predefined inter-layer prediction structure thatcan be employed in accordance with various implementations is provided.However, it should be understood that other predefined inter-layerprediction structures can be utilized in other implementations. In thestructure, for a 2D video layer, no 3DV supplemental layers are used asinter-layer prediction references. For the depth layer, the 2D videolayer is used as an inter-layer prediction reference. For the occlusionvideo layer, the 2D video layer and the depth layer are used asinter-layer references. For the occlusion depth layer, the 2D videolayer and the depth layer are used as inter-layer references. For thetransparency layer, the 2D video layer and the depth layer are used asinter-layer references.

Here in embodiment 7, because the inter-layer prediction structure canbe pre-defined, the extended SPS for 3DV can simply convey whether acertain layer is present for each 3DV view as shown in Table 14.Accordingly, the seq_parameter_set_(—)3dv_extension( ) can simply employflags for each possible 3DV layer to indicate whether they are employedin each 3DV view in the 3DV content. Thus, the extended SPS for 3DV neednot signal or convey the inter-layer prediction structure in anyexplicit way. In one implementation, the inter-layer predictionstructure is constant and cannot be changed. In another implementation,the inter-layer prediction structure is set using Table 13, (forexample, in an initial occurrence, or periodic occurrences, of Table12), and otherwise Table 14 is used to communicate the extensioninformation. It should be understood that Tables 12-14 may beretransmitted to the decoder as often as desired in accordance withdesign choice, and in one implementation are retransmitted only whenthere is a change to the information.

TABLE 14 A second implementation of seq_parameter_set_3dv_extension( )seq_parameter_set_3dv_extension( ) { C Descriptor seq_parameter_set_mvc_extension( )  for( l = 0; i <= num_views_minus1; i++) {    video_layer_flag[ i ] u(1)    depth_layer_flag[ i ] u(1)   occlusion_layer_video_flag[ i ] u(1)    occlusion_layer_depth_flag[ i] u(1)    transparency_layer_flag[ i ] u(1)  } }

To better illustrate how the seq_parameter_set_(—)3dv_extension( ) ofTable 14 can be utilized in embodiment 7, reference is made to FIGS. 24and 25, illustrating methods for encoding 2400 and decoding 2500,respectively, subset SPS 3DV. It should be understood that method 2400may be implemented by encoder 300 while method 2500 may be implementedby decoder 400.

Method 2400 may begin at step 2402 in which the encoder 300 may encodethe seq_parameter_set_mvc_extension( ) which is described in the updatedAVC draft. The encoder 300 may then perform loop 2404, in which theencoder 300 may set the 3DV layer flags to indicate whether therespective 3DV layers are present for a particular 3DV view For example,num_views_minus1 indicates the total number of 3DV views employed in the3DV content. For example, in the examples provided in FIGS. 10-12, three3DV views are employed (3DV view 0-3DV view 2). For convenience incoding and decoding, the numeric value of num_views_minus1 is one lessthan the actual number of 3DV views. The encoder 300 can iterate steps2406-2414 for each 3DV view ‘i’ until the total number of 3DV viewsemployed in the 3DV content is reached.

In particular, in loop 2404, the encoder 300 may set and encode the 2Dvideo layer flag at step 2406 to indicate whether the 2D video layer ispresent in the 3DV view ‘i,’ may set and encode the (2D) depth layerflag at step 2408 to indicate whether the depth layer is present in the3DV view ‘i,’ may set and encode the occlusion video layer flag at step2410 to indicate whether the occlusion video layer is present in the 3DVview ‘i,’ may set and encode the occlusion depth layer flag at step 2412to indicate whether the occlusion depth layer is present in the 3DV view‘i,’ and may set and encode the transparency layer flag at step 2414 toindicate whether the transparency layer is present in the 3DV view ‘i.’

Turning now to method 2500 for decoding subset SPS 3DV using Table 14,method 2500 may begin at step 2502 in which the decoder 400 may decodethe seq_parameter_set_mvc_extension( ) which is described in the updatedAVC draft. It should be noted that decoder 400 in method 2500 mayreceive a bitstream encoded by encoder 300 in accordance with method2400. The decoder 400 may also perform loop 2504, in which the decoder400 may decode the 3DV layer flags to determine whether the respective3DV layers are present for a particular 3DV view For example, asdiscussed above with regard to method 2400, num_views_minus1 indicatesthe total number of 3DV views employed in received 3DV content. Thedecoder 400 can iterate steps 2506-2514 for each 3DV view ‘i’ until thetotal number of 3DV views employed in the 3DV content is reached.

In particular, in loop 2504, the decoder 400 may decode and obtain the2D video layer flag at step 2506 to determine whether the 2D video layeris present in the 3DV view ‘i,’ may decode and obtain the (2D) depthlayer flag at step 2508 to determine whether the depth layer is presentin the 3DV view ‘i,’ may decode and obtain the occlusion video layerflag at step 2510 to determine whether the occlusion video layer ispresent in the 3DV view ‘i,’ may decode and obtain the occlusion depthlayer flag at step 2512 to determine whether the occlusion depth layeris present in the 3DV view ‘i,’ and may decode and obtain thetransparency layer flag at step 2514 to determine whether thetransparency layer is present in the 3DV view ‘i’.

As discussed above, the decoder 400 may reconstruct the referencepicture list(s) for each 3DV layer in each 3DV view to thereby permitthe decoder 400 to determine the inter-layer references for each 3DVlayer picture received during decoding of the pictures.

Additional Embodiments

With reference now to FIGS. 26 and 27, methods 2600 and 2700 forencoding and decoding 3DV content are illustrated. It should beunderstood that any one or more aspects discussed herein, andcombinations thereof, with respect to embodiments can be implemented inor with methods 2600 and 2700. For example, as discussed further hereinbelow, embodiments 1-3, taken singly or in any combination, can beimplemented in and by methods 2600 and 2700. Furthermore, it should alsobe noted that encoder 300 of FIG. 3 and decoder 400 of FIG. 4 can beused to implement methods 2600 and 2700, respectfully.

Method 2600 can begin at step 2602, in which the encoder 300 can encodemultiple pictures, where the multiple pictures describe different 3Dinformation for a given view at a given time. For example, any one ormore of the layer encoders discussed above with respect to encoder 300can be used to implement the encoding of multiple pictures in accordancewith any one or more of embodiments 1, 2, and/or 3. The multiplepictures may be, for example, a 2D video layer picture and a depth layerpicture. The 3D information described by the 2D video layer picture maybe, for example, the 2D video. Similarly, the 3D information describedby the depth layer picture may be, for example, the depth information.The 2D video information and the depth information are both examples of3D information for a given view at a given time.

For purposes of describing methods of additional embodiments, a“picture” can be equivalent to a “frame” discussed above with respect tovarious embodiments. Further, a picture can correspond to any one ormore 3DV layers discussed above. For example, a 2D view 1010 and a depthview 1008 can each constitute a separate picture. Additionally, any 2Dview layer 1118, 1122, 1136, 1218, 1222, 1236 and/or any depth layer1120, 1124, 1220, 1224, discussed above with respect to FIGS. 11 and/or12, can each constitute a separate picture. Moreover, other 3DVsupplemental layers, as discussed above, not explicitly illustrated inFIGS. 11 and 12 may also each constitute a separate picture.Furthermore, any one or more of the 3DV views discussed above mayconstitute a given view at a given time, such as 3D views 0, 1 and 2 attimes T0 and T1, discussed above with regard to FIGS. 11 and 12.

At step 2604, the encoder 300 can generate syntax elements thatindicate, for the encoded multiple pictures, how the encoded picturefits into a structure that supports 3D processing, the structuredefining content types for the multiple pictures. For example, the 3DVreference buffer 316 can generate syntax elements in accordance with anyone or more of embodiments 1, 2 and/or 3. The syntax elements may, forexample, be the 3DV prefix unit 16 discussed above with regard toembodiment 1, the NAL prefix unit 14 and/or the NAL unit 20 ofembodiment 2, and/or the NAL unit 21 of embodiment 3. As discussedabove, the novel NAL units according to embodiments 1, 2 and 3 canindicate, for encoded 3DV layers, how each layer fits into a structure,such as structure 1000 of FIG. 10, that supports 3D processing. Further,use of a novel NAL unit, such as NAL units 16 and 21, can indicate thata 3DV structure, such as that illustrated in FIG. 10, has been used inthe bitstream. As noted above, the structure 1000 can define differentcontent types, such as different types of 3DV layers. It should beunderstood that a structure can correspond to a set of 3DV views, asindicated in FIG. 10, and/or can correspond to a set of layers within a3DV view. It should also be understood that encoder 300 can encode apicture using a different encoded picture as a reference, therebyproviding inter-layer coding between pictures of different contenttypes. For example, using FIG. 10 as an example, a depth view of view1004 can be dependent from and reference a different layer, such as the2D view of view 1004, thereby providing inter-layer coding. In addition,the coding structure of FIG. 10 can be configured such that a 2D view ofview 1004 can be dependent from and reference a different layer, such asa depth layer, of view 1006. Other types of inter-layer coding arepossible, as indicated above, and can be implemented by one of ordinaryskill in the art in view of the teachings provided herein.

At step 2606, the encoder 300 can generate a bitstream that includes theencoded multiple pictures and the syntax elements, where the inclusionof the syntax elements provides, at a coded-bitstream level, indicationsof relationships between the encoded multiple pictures in the structure.For example, the 3DV Reference Buffer 316 may generate a bitstream 318,which may comprise any of the encoded bitstreams generated in accordancewith embodiments 1, 2 and/or 3, as discussed above. Thus, the bitstreamcan include multiple encoded pictures, such as any one or more of thelayer frames discussed above with regard to FIGS. 10-12, and can alsoinclude any one or more of 3DV prefix unit 16 of embodiment 1, the NALprefix unit 14 and/or the NAL unit 20 of embodiment 2, and/or the NALunit 21 of embodiment 3, which, as discussed above, can provide, at acoded-bitstream level, indications of relationships between the encodedmultiple pictures in the structure. For example, the syntax elements mayindicate the dependencies and relationships between pictures or layersin the structure of FIG. 10 or other structures that support 3DVcontent. For example, the syntax elements may provide an indication ofhow the pictures should be combined to generate 3DV content.

It should be understood that in accordance with various embodiments, theset of layer encoders 304-314 of encoder 300 can be configured toperform step 2602. Further, the 3DV reference buffer 316 and/or thelayer encoders 304-314 can be configured to perform either one or moreof steps 2604-2606. The encoder 300 may alternatively or additionallycomprise a processor configured to perform at least method 2600. Inaddition, embodiments can include a video signal and/or a video signalstructure that is formatted to include the multiple encoded picturesgenerated at step 2602, the syntax elements generated at step 2604,and/or any one or more elements included in the bitstream generated at2606, including the bitstream itself. Moreover, embodiments may includea processor readable medium that has the video signal structure storedthereon. Additionally, as indicated above, a modulator 722 of FIG. 7 canbe configured to modulate the video signal. Furthermore, embodiments mayinclude a processor readable medium having stored thereon instructionsfor causing the processor to perform at least method 2600.

Referring again to the method 2700 of FIG. 27 for decoding 3DV content,method 2700 may begin at step 2702, in which the decoder 400 may accessencoded multiple pictures from a bitstream. The multiple picturesdescribe different 3D information for a given view at a given time. Forexample, the bitstream may correspond to the bitstream generated inaccordance with method 2600. As discussed above with regard to method2600, any 2D view layer and/or any depth layer discussed above withrespect to FIGS. 11 and/or 12, can each constitute a separate picture.Moreover, other 3DV supplemental layers, as discussed above, notexplicitly illustrated in FIGS. 11 and 12 may also each constitute aseparate picture. Furthermore, any one or more of the 3DV viewsdiscussed above may constitute a given view at a given time, such as 3Dviews 0, 1 and 2 at times T0 and T1, discussed above with regard toFIGS. 11 and 12.

At step 2704, the decoder 400 can access syntax elements from thebitstream. The syntax elements indicate for the encoded multiplepictures how the encoded picture fits into a structure that supports 3Dprocessing. The structure provides a defined relationship between themultiple pictures. For example, the 3DV reference buffer 414 can accesssyntax elements in accordance with any one or more of embodiments 1, 2and/or 3. The syntax elements may, for example, be the 3DV prefix unit16 discussed above with regard to embodiment 1, the NAL prefix unit 14and/or the NAL unit 20 of embodiment 2, and/or the NAL unit 21 ofembodiment 3. As discussed above, the novel NAL units according toembodiments 1, 2 and 3 can indicate, for encoded 3DV layers, how eachlayer fits into a structure, such as structure 1000 of FIG. 10, thatsupports 3D processing. Further, use of a novel NAL unit, such as NALunits 16 and 21, can indicate that a 3DV structure, such as thatillustrated in FIG. 10, has been used in the bitstream. As noted above,the structure 1000 can define different content types, such as differenttypes of 3DV layers. It should be understood that a structure cancorrespond to a set of 3DV views, as indicated in FIG. 10, and/or cancorrespond to a set of layers within a 3DV view. It should also beunderstood that decoder 400 can decode a picture using a differentencoded picture as a reference, thereby permitting inter-layer decodingbetween pictures of different content types. For example, using FIG. 10as an example, a depth view of view 1004 can be dependent from andreference a different layer, such as 2D view of view 1004, therebypermitting inter-layer decoding. In addition, the coding structure ofFIG. 10 can be configured such that a 2D view of view 1004 can bedependent from and reference a different layer, such as a depth layer,of view 1006. Other types of inter-layer coding are possible, asindicated above, and can be implemented by one of ordinary skill in theart in view of the teachings provided herein. Moreover, as discussedabove with respect to embodiments 1-3, any one or more of 3DV prefixunit 16 of embodiment 1, the NAL prefix unit 14 and/or the NAL unit 20of embodiment 2, and/or the NAL unit 21 of embodiment 3 can provide adefined relationship between the pictures of the bit stream through theuse of 3DV view IDs and 3DV layer IDs, as discussed above. For example,the decoder 400 can be preconfigured to combine pictures in accordancewith a 3DV structure, such as structure 1000 of FIG. 10, and can use the3DV view IDs and 3DV layer IDs to identify which received picturescorrespond to the different layers in the pre-defined structure.

At step 2706, the decoder 400 can be configured to decode the encodedmultiple pictures. For example, the decoder 400 can decode the receivedpictures using layer decoders 402-412, as discussed above, for example,with respect to FIGS. 4 and 6. For example, the decoder 400 can use thedefined relationship indicated and provided by the syntax elements torender an additional picture that references one or more of atwo-dimensional (2D) video layer picture, a depth layer picture, anocclusion layer picture, or a transparency picture. For example, asdiscussed above, a depth view of view 1004 of FIG. 10 can be dependentfrom and reference a different layer, such as 2D view of view 1004,thereby providing inter-layer coding. Thus, the decoder 400 can renderan additional picture, such as a depth view of view 1004, from one ormore of a variety of different layer pictures.

At step 2708, the decoder 400 may provide the decoded pictures in anoutput format that indicates the defined relationship between themultiple pictures. For example, the 3DV reference/output buffer 414 ofdecoder 400 can output 3DV content that is formatted in accordance withthe 3DV structure. Thus, the output can indicate to a display device therelationships between multiple pictures in accordance with the structureto permit proper display of the 3DV content on a display device andenable a user to view the 3DV content. In particular, the output formatmay include syntax elements that specify how a decoded picture fits intoa structure. Examples of such syntax elements may include any one ormore of 3DV prefix unit 16 of embodiment 1, the NAL prefix unit 14and/or the NAL unit 20 of embodiment 2, and/or the NAL unit 21 ofembodiment 3.

Optional steps 2710-2714 may be performed at a decoder after performingstep 2708. Implementations may perform one or more of steps 2710-2714 aspart of step 2708 and/or as part of the decoding of step 2706. Invarious implementations, one or more of steps 2710-2714, particularlystep 2714, may be performed at a display.

Optionally, at step 2710, the decoder 400 can identify a 2D videopicture from the multiple pictures using the syntax elements. Forexample, the decoder 400 may identify a 2D video picture or layer byparsing any one or more of a 3DV prefix unit 16 of embodiment 1, the NALprefix unit 14 and/or the NAL unit 20 of embodiment 2, and/or the NALunit 21 of embodiment 3, implemented to encode 3DV layers. The decoder400 may further determine which of the encoded pictures have a 2D viewlayer ID, which was denoted above as ‘0,’ and determine thecorresponding 3DV view using the 3DV view ID.

Optionally, at step 2712 the decoder 400 can identify a depth picturefrom the multiple pictures using the syntax elements. For example, thedecoder 400 may identify a depth picture or layer by parsing any one ormore of a 3DV prefix unit 16 of embodiment 1, the NAL prefix unit 14and/or the NAL unit 20 of embodiment 2, and/or the NAL unit 21 ofembodiment 3, implemented to encode 3DV layers. Moreover, the decoder400 can determine which of the encoded pictures have a depth layer ID,which was denoted above as ‘1,’ and determine the corresponding 3DV viewusing the 3DV view ID. It should be noted that other 3DV supplementallayers can be identified using syntax elements in accordance withvarious embodiments.

Optionally, at step 2714, the decoder 400 can render a new picture foran additional view based on the 2D video picture and the depth picture.For example, the identified pictures or views may correspond to 2D view1010 and depth view 1008 of FIG. 10. In addition, 3DV views 1004 and1006 can, for example, be rendered by using 2D view 1010 and depth view1008 of 3DV base view 1002 as a reference in accordance with thedescription provided above with regard to FIG. 10. Similarly, the 2Dvideo layer and depth layer of 3DV view 1006 can be used as a referenceto render 3DV view 1004 in accordance with the description providedabove with regard to FIG. 10.

It should be understood that in accordance with various embodiments, theset of layer decoders 402-412 of decoder 400 can be configured toperform steps 2702 and 2706. Further, the 3DV reference buffer 414and/or the layer decoders 402-412 can be configured to perform eitherone or more of steps 2704 and 2708. The decoder 400 may alternatively oradditionally comprise a processor configured to perform at least method2700. Moreover, as indicated above, a demodulator 822 of FIG. 8 can beconfigured to demodulate a video signal including a bitstream from whichmultiple encoded pictures are accessed in step 2702. Furthermore,embodiments may include a processor readable medium having storedthereon instructions for causing the processor to perform at leastmethod 2700.

With reference now to FIG. 28, a method 2800 for constructing areference picture list is illustrated. It should be understood that anyone or more aspects discussed herein, and combinations thereof, withrespect to embodiments can be implemented in or with methods 2800. Forexample, as discussed further herein below, embodiment 4 can beimplemented in and by method 2800. In addition, any one or more ofembodiments 1-3 and 5-7 can be combined with embodiment 4 andimplemented in or with method 2800. Furthermore, it should also be notedthat encoder 300 of FIG. 3 and/or decoder 400 of FIG. 4 can be used toimplement method 2800. Moreover, although method 2800 describesconstructing a reference picture list for a picture, such a referencelist may be constructed for a sequence of pictures, for a set ofpictures across multiple views or for a subset of a picture, asdiscussed above with regard to embodiment 4.

Method 2800 may begin at optional step 2802, in which the encoder 300 orthe decoder 400 can determine an inter-layer reference for a picturebased on dependency information for the picture. For example, thedecoder 400 may extract and decode the dependency information fromreceived syntax elements conveying a sequence parameter set (SPS), asdiscussed above. In turn, for encoder 300, the dependency informationmay be the same as the dependency information the encoder 300 includedin the SPS, as discussed above, for example, with respect to embodiments5-8. For example, the encoder 300 may obtain the dependency informationfrom a configuration file that is stored on the encoder. It should beunderstood that the dependency information may include any one or moreof temporal dependencies, inter-view dependencies and inter-layerdependencies indicating how different pictures and picture types arepredictively encoded. Thus, based on the dependency information, theencoder 300 or decoder 400 can determine an inter-layer reference forthe picture for which a reference picture list is being constructed. Inaddition, the inter-layer reference may conform to inter-layerreferences discussed above with regard to embodiment 4. For example, theinter-layer reference may correspond to any one or more of thestructures discussed above with regard to FIG. 15.

At step 2804, the encoder 300 or decoder 400 may determine a priority ofthe inter-layer reference relative to one or more other references forthe picture. For example, the encoder 300 or decoder 400 may beconfigured to apply a priority scheme to prioritize pictures in thereference picture list. For example, as discussed above with regard toembodiment 4, the pictures in the reference list may beordered/prioritized in accordance with the degree of redundancy thepicture for which the reference picture list is constructed has with thepictures listed in its reference picture list. For example, as discussedabove with regard to a depth picture, the inter-layer reference isexpected to have the least redundancy as compared to temporal andinter-view references in the reference list. Thus, the inter-layerreference has a lower priority than the temporal and inter-viewreferences. It should be noted that any of the priorities provided abovewith regard to the different 3DV layer types in embodiment 4 can beapplied here in step 2804. However, it should also be understood thatdifferent priorities may also be employed in accordance with variousaspects described herein. For example, the priorities may vary inaccordance with the actual redundancy between picture references and thepicture associated with the reference picture list for the 3DV content.For example, redundancies can be determined based on measurements of thepictures or layers composing the 3DV content and the priority scheme canbe tailored to reflect the measured redundancy levels such thatreference pictures having a higher redundancy are given higher priorityover reference pictures having a lower redundancy with the pictureassociated with the reference list. Furthermore, such priority schemesmay, in other aspects or embodiments, be devised differently for eachpicture or reference picture list.

At step 2806, the encoder 300 or the decoder 400 may include theinter-layer reference in an ordered list of references for the picturebased on the priority. For example, inter-layer reference pictures witha lower or lowest priority may be included after other referencepictures with a higher priority or at the end of the list. In turn,inter-layer reference pictures with a higher or highest priority areincluded before other reference pictures with a lower priority or at thebeginning of the list. Such references may include a temporal and/or aninter-view reference, as discussed above. As indicated above, theinter-layer references may be included in the list of references inaccordance with method 1600 for the encoder implementation or method1700 for the decoder implementation. Further, the inter-layer referencemay be included in the list of references in accordance with otherpriority schemes, as discussed above with respect to step 2804. Itshould be noted that the lists may be ordered and prioritized based onexpected use so that smaller indices can be used for more commonreferences and bits can be saved in transmission.

At optional step 2808, the encoder 300 or the decoder 400 may use theinter-layer reference in a coding operation involving the picture. Forexample, the encoder 300 may perform a predictive encoding operation toencode the picture for which the reference list was constructed usingthe inter-layer reference as a reference picture. In turn, the decoder400 may perform a predictive decoding operation to decode the picturefor which the reference list was constructed using the inter-layerreference as a reference picture. Thus, encoding or decoding of thepicture may, at least in part, be based on the inter-layer reference.

Optionally, at step 2810, the encoder 300 or decoder 400 may generate abitstream that includes the coded picture. For example, the encoder 300may include the encoded picture in bitstream 318 in accordance with thediscussion provided above with regard to FIGS. 3 and 5. In addition, thedecoder 400 may include the decoded picture in bitstream 416 inaccordance with the discussion provided above with regard to FIGS. 4 and6.

Thereafter, the method may end or may repeat such that the encoder 300or the decoder 400 may generate a reference picture list for anotherpicture or may generate a second reference picture list for the samepicture if the picture is a B picture.

One implementation performs only steps 2804 and 2806. An inter-layerreference may be provided, for example, and the implementationdetermines a priority of the inter-layer reference. The implementationthen includes the inter-layer reference in an ordered list, based on thedetermined priority.

Returning to step 2802, optionally, step 2802 may include theperformance of method 2900 provided in FIG. 29 for processing 2D videolayer pictures. For example, method 2900 may begin at step 2902, inwhich the encoder 300 or decoder 400 may determine whether the picturefor which the reference picture list is constructed is a 2D video layerpicture. If the reference is not a 2D video layer picture, then themethod may proceed to step 2804 of method 2800. Otherwise, the methodmay proceed to step 2904, in which the encoder 300 or decoder 400 mayexclude any inter-layer reference from the reference picture list. Forexample, as discussed above with regard to embodiment 4, refraining fromusing inter-layer references for the 2D video layer may permit aconventional MVC to extract 3DV content and format the content fordisplay. Thereafter, the method may proceed to step 2804 of method 2800.

Step 2904 may also be modified to exclude only depth layers from beingused as references for 2D video layers. Such an implementation may, forexample, rely on occlusion video layers as inter-layer reference for 2Dvideo layers.

It should be understood that in accordance with various embodiments, aset of layer coders, such as layer decoders 402-412 of decoder 400 orlayer encoders 304-314 of encoder 300, can be configured to performsteps 2808 and step 2810. Further, the 3DV reference buffer 414, the 3DVreference buffer 316, and/or the layer coders can be configured toperform either one or more of steps 2802-2806 and 2810. The encoder 300or the decoder 400 may alternatively or additionally comprise aprocessor configured to perform at least method 2800. Moreover,embodiments may include a processor readable medium having storedthereon instructions for causing the processor to perform at leastmethod 2800.

With reference now to FIGS. 30 and 31, methods 3000 and 3100 forencoding and decoding 3DV content, such that 3DV inter-layerdependencies structures are conveyed, are illustrated. It should beunderstood that any one or more aspects discussed herein, andcombinations thereof, with respect to various embodiments can beimplemented in or with methods 3000 and 3100. For example, as discussedfurther herein below, embodiments 5-7 can be implemented in and bymethods 2600 and 2700. Furthermore, it should also be noted that encoder300 of FIG. 3 and decoder 400 of FIG. 4 can be used to implement methods3000 and 3100, respectively.

Method 3000 can begin at step 3002 in which the encoder 300 may generatesyntax elements indicating an inter-layer dependency structure among 3DVlayers. For example, the syntax elements may be generated as discussedabove with regard to any one or more of embodiments 5-7. For example,NAL units 17 may be employed as the syntax elements to convey aninter-dependency structure, as discussed above with regard to embodiment5. Furthermore, the inter-dependency structure may be conveyed asdiscussed above with regard to embodiments 6 and 7 and with regard toTables 13 and 14. For example, any one or more of methods 2000, 2200 and2400 may be employed to convey the inter-dependency structure. Forexample, the syntax elements may explicitly convey the inter-layerdependency structure, as discussed above with regard to embodiment 6, orthe syntax elements may indicate the inter-layer dependency structure byconveying whether particular 3DV layers are present for each 3DV viewusing 3DV layer ids, where the inter-layer dependency is pre-defined, asdiscussed above with regard to embodiment 7. In addition, theinter-layer dependency structure may correspond to one of many differentinter-layer dependency structures. For example, the inter-layerdependency structure may correspond to that described above with regardto FIG. 15 as well as that discussed above with regard to embodiment 7.Moreover, as stated above, the inter-layer dependency structure may beprovided in any one or more of the NAL unit header, SPS, PPS, SEI or aslice header. Further, the encoder 300 may generate syntax elements byconstructing and employing reference picture lists, as discussed above,for example, with regard to embodiment 4.

At step 3004, the encoder 300 may identify, based on the inter-layerdependency structure, an inter-layer reference for a picture from alayer of the 3D layers. For example, if the inter-layer dependencystructure corresponds to that described above with regard to FIG. 15, toencode a depth layer picture, the encoder 300 may employ a referencepicture list, which may be constructed at step 3002, to determine thatan inter-layer reference for the depth layer picture is a 2D video layerpicture in the same view or 3DV view as the depth layer picture. Asnoted above, the inter-dependency structure can vary and can includemany different types of layers, such as a 2D video layer, depth layer,occlusion video layer, occlusion depth layer and transparency layer,among others, with different inter-dependencies, including, for example,inter-layer dependencies between different 3DV views.

At step 3006, the encoder 300 can encode the picture based, at least inpart, on the inter-layer reference. For example, the encoder 300 mayencode the picture as discussed above with regard to FIGS. 3 and 5 usingencoders 304-314. Here, again using structure 1500 and the depth layeras an example, the depth layer may be encoded based, at least in part,on the 2D video layer, as discussed above.

At optional step 3008, the encoder 300 can generate a bitstream thatincludes the encoded picture. For example, the encoded bitstream may begenerated as discussed above with regard to FIGS. 3 and 5 and maycorrespond to, for example, bitstream 318.

At optional step 3010, the encoder 300 may provide the encoded pictureand the syntax elements for use in decoding the encoded picture. Forexample, the syntax elements and the encoded picture may be transmittedvia bitstream 318 to a decoder 400. Alternatively, the syntax elementsmay be transmitted in a bitstream that is separate from a bitstream usedto transmit 3DV data content. Thus, bitstream 318 in FIG. 3 mayrepresent two separate corresponding bitstreams. Alternatively, thedifferent bit streams may be transmitted separately. For example, onebit stream may be transmitted to a decoder 400 via a cable network whilethe other bitstream may be transmitted to the decoder 400 wirelessly. Inaddition, the syntax elements may be used to decode the encoded pictureas discussed herein below with respect to method 3100.

It should be understood that in accordance with various embodiments, theset of layer encoders 304-314 of encoder 300 can be configured toperform step 3006. Further, the 3DV reference buffer 316 and/or thelayer encoders 304-314 can be configured to perform one or more of steps3002, 3004, 3008 and 3010. The encoder 300 may alternatively oradditionally comprise a processor configured to perform at least method3000. In addition, embodiments can include a video signal and/or a videosignal structure that is formatted to include the encoded picture, thesyntax elements and/or the bitstream generated in accordance with method3000. Moreover, embodiments may include a processor readable medium thathas the video signal structure stored thereon. Additionally, asindicated above, a modulator 722 of FIG. 7 can be configured to modulatethe video signal. Furthermore, embodiments may include a processorreadable medium having stored thereon instructions for causing theprocessor to perform at least method 3000.

One implementation performs only steps 3002-3006. The implementationgenerates the syntax elements, identifying an inter-layer reference fora picture, and then encodes the picture based, at least in part, on theidentified inter-layer reference. The implementation does not, in thiscase, need to generate a bitstream including the encoded picture, or toprovide the encoded picture and syntax for use in decoding.

Referring again to the method 3100 of FIG. 31 for decoding 3DV content,method 3100 may begin at step 3102. Decoder 400 may access an encodedpicture from a bitstream, where the picture describes 3DV informationfor a particular 3DV layer, from a given view, at a given time. Forexample, the encoded picture can correspond to any one or more 3DVlayers discussed above. For example, a 2D view 1010 and a depth view1008 can each constitute a separate picture. Additionally, any 2D viewlayer 1118, 1122, 1136, 1218, 1222, 1236 and/or any depth layer 1120,1124, 1220, 1224, discussed above with respect to FIGS. 11 and/or 12,can each constitute a separate picture. Moreover, other 3DV supplementallayers, as discussed above, not explicitly illustrated in FIGS. 11 and12 may also each constitute a separate picture. Furthermore, any one ormore of the 3DV views discussed above may constitute a given view at agiven time, such as 3D views 0, 1 and 2 at times T0 and T1, discussedabove with regard to FIGS. 11 and 12. Further, the encoded picture maybe the encoded picture generated by method 3000.

At step 3104, the decoder 400 may access syntax elements indicating aninter-layer dependency structure for a set of 3DV layers that includesthe particular 3DV layer. For example, NAL units 17 may be the syntaxelements that indicate an inter-dependency structure, as discussed abovewith regard to embodiment 5. Furthermore, the inter-dependency structuremay be indicated or conveyed as discussed above with regard toembodiments 6 and 7 and with regard to Tables 13 and 14. For example,any one or more of methods 2000, 2200 and 2400 may be employed to conveyor indicate the inter-dependency structure.

For example, the syntax elements may explicitly convey the inter-layerdependency structure, as discussed above with regard to embodiment 6. Orthe syntax elements may indicate the inter-layer dependency structure byconveying whether particular 3DV layers are present for each 3DV viewusing 3DV layer ids, where the inter-layer dependency is pre-defined, asdiscussed above with regard to embodiment 7. In addition, theinter-dependency structure may correspond to one of many differentinter-dependency structures. For example, the inter-dependency structuremay correspond to that described above with regard to FIG. 15 as well asthat discussed above with regard to embodiment 7. Moreover, as statedabove, the inter-dependency structure and the syntax elements may beobtained from any one or more of the NAL unit header, SPS, PPS, SEI or aslice header. Further, the decoder may access the syntax elements, forexample, as discussed above with regard to any one or more of methods2100, 2300 and 2500.

At step 3106, the decoder 400 may decode the encoded picture based, atleast in part, on the inter-layer dependency structure. For example, thedecoder 400 may decode the encoded picture as discussed above withregard to FIGS. 4 and 6. Further, the decoder 400 may construct andemploy one or more reference picture lists using the syntax elements, asdiscussed above with, for example, regard to embodiment 4, to decode theencoded picture. Thus, the decoder 400 may determine the encodedpicture's references for predictive coding purposes and may decode thepicture based at least in part on its references.

At optional step 3108, the decoder 400 may provide the decoded picturesin an output format that indicates the inter-layer dependency structure.For example, the 3DV reference/output buffer 414 of decoder 400 canoutput 3DV content that is formatted in accordance with the inter-layerdependency structure. Thus, the output can indicate to a display devicethe relationships between multiple pictures in accordance with thestructure to permit proper display of the 3DV content on a displaydevice and enable a user to view the 3DV content. In particular, theoutput format may include syntax elements that specify how a decodedpicture fits into the structure. Examples of such syntax elements mayinclude NAL unit 17, as discussed above.

It should be understood that in accordance with various embodiments, theset of layer decoders 402-412 of decoder 400 can be configured toperform step 3106. Further, the 3DV reference buffer 414 and/or thelayer decoders 402-412 can be configured to perform one or more of steps3102, 3104 and 3108. The decoder 400 may alternatively or additionallycomprise a processor configured to perform at least method 3100.Moreover, as indicated above, a demodulator 822 of FIG. 8 can beconfigured to demodulate a video signal including a bitstream from whichmultiple encoded pictures are accessed in step 3102. Furthermore,embodiments may include a processor readable medium having storedthereon instructions for causing the processor to perform at leastmethod 3100.

It should be understood that the embodiments discussed above may becombined in a variety of ways by those of ordinary skill in the art inview of the teachings provided herein. For example, with reference nowto FIG. 32, a NAL unit stream 3200 incorporating features from severalembodiments discussed above is illustrated. Here, stream 3200 mayinclude NAL unit 15 (3202) for a subset sequence parameter set for MVC,as provided above in Table 3 and defined in the AVC draft. In addition,stream 3200 may further include NAL unit 17 for the extended SPS for 3DVindicating at least one inter-layer dependency structure as discussedabove with regard to embodiments 5-7. Here, for simplicity purposes, theinter-layer dependency structure shown in FIG. 10 is employed in stream3200.

Similar to FIGS. 11 and 12, FIG. 32 provides sets of 3DV viewscorresponding to a time T0 and time T1, respectively. The truncation ofFIGS. 11 and 12 discussed above is also applied to FIG. 32 and thearrows of FIG. 32 indicate the transmission order of NAL units, similarto the arrows of FIGS. 11 and 12. Of course, FIG. 32 is a small excerptof the stream 3200. Stream 3200 would comprise many more NAL units for amultitude of different time instances in a practical application. Inaddition, the use of three 3DV views is an example and many more viewsmay be employed and/or rendered at a decoder, as understood by those ofordinary skill in the art familiar with MVC. Furthermore, the use of two3DV layers for each view is also an example and it should be understoodthat several additional 3DV layers may be employed, as discussed atlength above.

In the excerpt of stream 3200, three 3DV views 3206, 3208, and 3210correspond to time T0 while three 3DV views 3212, 3214, and 3216correspond to time T1. Similar to FIGS. 11 and 12, 3DV view 0 (3206,3212) can correspond to base view 1002 in FIG. 10, while 3DV view 2(3208, 3214) and 3DV view 1 (3210, 3216) may correspond to P view 1006and B view 1004 of FIG. 10, respectively. 3DV view 3206 may comprise NALunits 16 (3220), 14 (3222), and 5 (3224), composing a 2D video layer3218. As discussed above, a NAL unit 5 includes video data of a codedslice of an instantaneous decoding refresh (IDR) picture and is composedof only intra slices or SI slices, as defined in the AVC draft. Inaddition, NAL unit 14 may include, as an MVC prefix a reference denotingthe 2D video layer 3218 as a base layer for other views in accordancewith MVC. In another implementation, in which a stereo profile is used,NAL units 14 and 17 may be omitted.

A NAL unit 16 may, for example, include a 3DV view ID and a 3DV layer IDas discussed above with regard to embodiment 1. Here, the 3DV view IDand a 3DV layer ID may, for example, be used by a decoder 400 toidentify the 2D video layer 3218 as an inter-layer reference for depthlayers, or for other 3DV layers. As shown in FIG. 32, 3DV view 3206 mayfurther include a depth layer 3226 composed of NAL unit 21 (3228),described above with regard to embodiment 3. As discussed above withregard to embodiment 3, a NAL unit 21 may include 3DV view ID and a 3DVlayer ID in addition to other information provided in MVC NAL unitheader extension.

As discussed above with regard to embodiments 4-7, a decoder 400 mayreconstruct a reference picture list using the information provided inthe SPS, such as the inter-layer dependency structure provided by NALunit 17, and use the reference picture list to properly decode 3DVcontent. For example, based on the 3DV view ID and a 3DV layer ID, thedecoder 400 may determine the corresponding layer's (in this case depthlayer 3226) role in the inter-layer dependency structure conveyed in theSPS. Here, the 3DV view ID and a 3DV layer ID may indicate that the 2Dvideo layer 3218 should be used as a reference to decode the depth layer3226.

As also shown in FIG. 32, each other 3DV view for time T0 is composed ofNAL units 20 and 21 corresponding to a 2D video layer and a depth layer,respectively, in the 3DV view. The NAL units within views 3208 and 3210may have the same function as the NAL units in views 1206 and 1208, asdiscussed above with regard to FIG. 12. Similarly, the set of 3DV viewsof time T1 is structured in essentially the same way as the set of 3DVviews for time T0 except that NAL units 5 in 3DV View 3206 is replacedwith NAL units 1 in 3DV View 3212. As discussed above with regard toembodiment 3, a NAL unit 1 includes video data of a coded slice of anon-IDR picture.

With reference now to FIG. 33, a system 3300 for managing networktraffic by employing inter-layer dependency structures is illustrated.System 3300 may include transmission system/apparatus 700 and receivingsystem/apparatus 800 described above with respect to FIG. 7 and FIG. 8.In particular, the encoder 710 of transmission system/apparatus 700 maybe implemented by the encoder 300 discussed above with regard to thevarious implementations described herein. Similarly, the decoder 820 oftransmission system/apparatus 800 may be implemented by the decoder 400discussed above with regard to the various implementations describedherein. The input and output of the system 3300 is listed, in FIG. 33,as “input video(s)” and “output video”. It should be clear that, atleast in this implementation, these refer to 3D videos that includemultiple layers.

System 3300 may further include a network device/system 3301 provided ina network 3350 between the transmission system/apparatus 700 andreceiving system/apparatus 800. The network 3350 may, for example, be awired network, such as the internet, a wide area network or a local areanetwork (LAN), or a wireless network, such as a wireless cellularnetwork or a wireless LAN. In turn, the network device 3301 may beimplemented as a router in a wired network or as a base station in awireless network. As illustrated in FIG. 33, the network device 3301 mayinclude a parser 3302, a controller 3304, a network traffic monitor 3306and a forwarding module 3308. Additionally, each element of networkdevice 3301 may be implemented as hardware elements or a combination ofsoftware and hardware. Network device 3301 and the functions of itselements are described in more detail below with regard to method 3400of FIG. 34, which may be implemented by network device 3301.

Referring to FIG. 34 with continuing reference to FIG. 33, a method 3400for managing network resources is provided. Method 3402 may begin atstep 3402, in which a parser 3302 may parse received syntax elementsindicating an inter-layer dependency structure for 3DV layers todetermine forwarding priorities for at least a subset of the 3DV layersbased on the structure. For example, the syntax elements may be receivedfrom transmission system/apparatus 700 in NAL units 17, which mayindicate the inter-dependency structure in accordance with, for example,Tables 13 or 14, as discussed above with regard to embodiments 5-7.

Here, the parser 3302 may determine forwarding priorities in accordancewith the importance of 3DV layers as indicated in the inter-dependencystructure. For example, the 3DV layer ids may be configured such thatthe lowest number corresponds to the highest priority while the highestnumber corresponds to the lowest priority. If the inter-dependencystructure of FIG. 15 and the 3DV layer identifiers of Table 1 areemployed, the parser 3302 may determine that the 2D video layer has thehighest priority, the depth layer has the next highest priority, etc.,based on the 3DV layer identifiers. In particular, the 3DV layeridentifiers may be ordered in accordance with the importance of itscontribution to providing 3DV content. For example, referring to FIG. 2,the depth layer 204 of the 2D video may be considered more importantthan the occlusion video layer 206 or the occlusion depth layer 208,because it provides a three-dimensional effect to the main object in aview, whereas the occlusion video layer 206 or the occlusion depth layer208 do not. Variations of determining importance between different 3DVlayers can be applied.

For example, one variation may be to base the priority of the layers inaccordance with the number of references the layers have in theinter-layer dependency structure or the number reference picture listsin which a particular layer may be included. For example, in response todetermining the inter-layer dependency structure and/or thecorresponding inter-layer references, the parser 3302 may allocate thehighest priority to the layer that is referenced by the most layerswhile allocating the lowest priority to layers that are referenced theleast. For example, in the inter-layer dependency structure of FIG. 15,the 2D video layer would have the highest priority, as it is referencedby three layers, while the depth layer, which is referenced by onelayer, would have the next highest priority, etc.

Other variations may be directed to ordering the 3DV layers inaccordance with the number of references employed to properlyencode/decode the layer. For example, in the inter-dependency structuredescribed above with regard to embodiment 7, the occlusion video,occlusion depth, and the transparency layers may be given the lowestpriority because they each employ two (inter-layer) references, thedepth layer may be given the next higher priority because it employs one(inter-layer) reference and the 2D vide layer may be given the highestpriority because it does not employ any (inter-layer) references.

Further, different combinations may be applied to determine a priority.For example, a weighing function considering both a given layer'simportance in rendering a 3D view and the number of layers thatreference the given layer may be used to determine forwardingpriorities. Moreover, it should be understood that other type ofreferences in addition to the inter-layer references may be considered.For example, the above-described priority determinations may furtherconsider the temporal references and inter-view references on which aparticular layer depends. Thus, the above-described reasoning may beapplied to any type of references and/or combination of references, suchas temporal references and inter-view references, and/or inter-layerreferences.

At step 3404, the parser 3302 may receive data units for constructing3DV layers. For example, with reference again to FIG. 11, the parser3302 may receive NAL units 16, 14, and 5 that are employed to constructa 2D video layer 1118. The parser 3302 may further receive NAL units 16and 20 that are used to construct a depth layer 1120, etc.

At step 3406, the network traffic monitor 3306 may measuretraffic/network congestion on the network 3350. A variety of knownnetwork traffic monitors may be employed here, as understood by those ofordinary skill in the art.

At step 3408, the controller 3304, based on congestion measurementsreceived from the network traffic monitor 3306, may determine whether afirst congestion threshold is met by the network traffic measured atstep 3406. It should be understood that, here, optionally, a pluralityof different congestion thresholds may be employed and associated with3DV layers in accordance with the determined forwarding priorities,which may be based on the inter-layer dependency structure, as discussedabove. For example, one congestion threshold may be used for each 3DVlayer employed to render the 3DV content or for each droppable 3DVlayer. For example, with reference again to Table 1, if the forwardingpriorities are determined in accordance with the 3DV layer ID number asdiscussed above with regard to step 3402, then the first threshold maybe associated with the transparency layer, the second threshold, whichcorresponds to a higher level of network congestion than the firstthreshold, may be associated with the occlusion depth layer, the thirdthreshold, which corresponds to a higher level of network congestionthan the first and second thresholds, may be associated with theocclusion video layer, etc.

Thus, if the first congestion threshold is met, then, at step 3412, thecontroller 3304 may drop units or NAL units received at step 3404 forthe 3DV layer having the lowest priority and may direct the forwardingmodule 3308 to forward the units for the remaining 3DV layers (if thenext threshold is not met) to the receiving system apparatus 800. If thefirst congestion threshold is not met, then the forwarding module 3308,under the direction of the controller 3304, may forward units for all3DV layers at step 3410. It should be understood that the thresholddeterminations may be repeated for each of an N number of 3DV layers.For example, the N number of layers may correspond to the number oflayers employed within one or more views to render the 3DV content. Assuch, the threshold determinations may be repeated for each thresholdand unit dropping and forwarding decisions may be made depending on theresults.

For example, if after step 3412, the second threshold is not met, thenunits for N−1 3DV layers may be forwarded by the forwarding module 3308at step 3412 to the receiving unit 800. Alternatively, if after step3412, the controller 3304 determines that the first N−2 thresholds aremet, then the method may proceed to step 3414, in which the controller3304 may determine whether the (N−1)th congestion threshold is met. Ifthe (N−1)th congestion threshold is not met, then the forwarding module3308, under the direction of the controller 3304, may, at step 3416,forward units for the 3DV layers having the highest two priorities. Inaddition, at step 3416, the controller 3304 may drop the N−2 lowestpriority 3DV layers, as the thresholds for the N−2 lowest priority 3DVlayers have been met. If the (N−1)th congestion threshold is met, thenthe forwarding module 3308, under the direction of the controller 3304,may, at step 3418, forward units for the 3DV layer having the highestpriority. Additionally, controller 3304 may drop units for (N−1) lowestpriority 3DV layers. Accordingly, method 3400 may proceed throughthreshold determinations such that when the Mth threshold is met and the(M+1)th threshold is not met, then the units, for example, NAL units,for the M lowest priority 3DV layers are dropped and the remaininghigher priority layers are forwarded. It should be noted that, in thisexample, only N−1 thresholds are considered to ensure that at least thehighest priority layer is not dropped to ensure that the receivingapparatus/system 800 can decode at least some content. However,variations of method 3400 may be employed. It should also be noted thatone or more steps of method 3400 may be repeated periodically to accountfor any changes in network congestion.

It should be clear that other implementations are possible, other thanmethod 3400. One such implementation is more general and includesaccessing syntax elements that indicate an inter-layer dependencystructure among three-dimensional video (3DV) layers. This accessing maybe performed, for example, by parsing received syntax elements as shownin step 3402.

The implementation also determines a transmission priority for aparticular 3DV layer of the 3DV layers based on the structure. Atransmission priority may be, for example, a priority related toforwarding a picture (or a part of a picture) or dropping a picture (ora part of a picture) from the stream. The transmission priority may bedetermined, for example, by determining how many layers use theparticular 3DV layer as a reference (inter-layer reference, inter-viewreference, and/or temporal reference).

The implementation also determines whether to transmit encoded databelonging to the particular 3DV layer. The determination of whether totransmit is based on the determined transmission priority for theparticular 3DV layer and based on an indication of network congestion.Network congestion may be determined, for example, as in step 3406. Anindication of network congestion may include, for example, a flag (orset of flags) that indicates whether one or more congestion thresholdshave been satisfied, as in steps 3408 and 3414. Other indicators mayinclude, for example, measures of network activity (throughput rates,error rates, numbers or rates of retransmission requests, numbers orrates of acknowledgements, etc.).

A further implementation accesses such a transmission priority, anddetermines whether to transmit encoded data belonging to the particular3DV layer based on the accessed transmission priority for the particular3DV layer and based on an indication of network congestion. Thisimplementation, however, need not access syntax indicating theinter-layer dependency structure among the 3DV layers. Thisimplementation also need not determine, based on the inter-layerdependency structure, the transmission priority.

It should also be clear that a transmission priority may be based, inwhole or in part, on other information. Such information may include,for example, a temporal level ID, a priority ID, or a view ID, asrelated, for example, to AVC, MVC, or SVC systems.

We thus provide one or more implementations having particular featuresand aspects. However, features and aspects of described implementationsmay also be adapted for other implementations.

Several of the implementations and features described in thisapplication may be used in the context of the H.264/MPEG-4 AVC (AVC)Standard, or the AVC standard with the MVC extension, or the AVCstandard with the SVC extension. Additionally, implementations may beused in the context of a coding standard or coding proposals from (a)the Joint Collaborative Team for Video Coding (JCT-VC) from MPEG andITU-T, (b) the High-performance Video Coding group from MPEG, (c) theNext Generation Video Coding group from the Video Coding Experts Group(VCEG) of ITU-T, (d) the 3D Video Coding group from MPEG, (e) any othergroup associated with one or more of MPEG or ITU-T, or (f) a standard(proprietary or public) developed by a company. However, theseimplementations and features may be used in the context of anotherstandard (existing or future), or in a context that does not involve astandard.

Further, implementations may signal information using a variety oftechniques including, but not limited to, SEI messages, slice headers,other high level syntax, non-high-level syntax, out-of-band information,datastream data, and implicit signaling. Accordingly, althoughimplementations described herein may be described in a particularcontext, such descriptions should in no way be taken as limiting thefeatures and concepts to such implementations or contexts.

Additionally, many implementations may be implemented in one or more ofan encoder, a decoder, a post-processor processing output from adecoder, or a pre-processor providing input to an encoder. Furthermore,other implementations are contemplated by this disclosure.

Reference in the specification to “one embodiment” or “an embodiment” or“one implementation” or “an implementation”, as well as other variationsthereof, mean that a particular feature, structure, characteristic, andso forth described in connection with the embodiment is included in atleast one embodiment. Thus, the appearances of the phrase in “oneembodiment” or in an “embodiment” or in an “implementation” or in an“implementation”, as well any other variations, appearing in variousplaces throughout the specification are not necessarily all referring tothe same embodiment.

It is to be appreciated that the use of any of the following “/”,“and/or”, and “at least one of,” for example, in the cases of “NB,” “Aand/or B” and “at least one of A and B,” is intended to encompass theselection of the first listed option (A) only, or the selection of thesecond listed option (B) only, or the selection of both options (A andB). As a further example, in the cases of “A, B, and/or C” and “at leastone of A, B, and C” and “at least one of A, B, or C,” such phrasing isintended to encompass the selection of the first listed option (A) only,or the selection of the second listed option (B) only, or the selectionof the third listed option (C) only, or the selection of the first andthe second listed options (A and B) only, or the selection of the firstand third listed options (A and C) only, or the selection of the secondand third listed options (B and C) only, or the selection of all threeoptions (A and B and C). This may be extended, as readily apparent byone of ordinary skill in this and related arts, for as many itemslisted.

Also, as used herein, the words “picture” and “image” are usedinterchangeably and refer, for example, to all or part (a portion) of astill image or all or part (a portion) of a picture from a videosequence. More generally, a picture refers, for example, to any set ofimage or video data. A picture may be, for example, a pixel, amacroblock, a slice, a frame, a field, a full picture, a region boundingan object in the picture, the foreground of the picture, the backgroundof the picture, or a particular set of (x,y) coordinates in the picture.Similarly, a “portion” of a picture may be, for example, a pixel, amacroblock, a slice, a frame, a field, a region bounding an object inthe picture, the foreground of the picture, the background of thepicture, or a particular set of (x,y) coordinates in the picture. Asanother example, a depth picture (depth image) may be, for example, acomplete depth map or a partial depth map that only includes depthinformation for, for example, a single macroblock of a correspondingvideo frame.

Additionally, those of skill in the art will appreciate that a layer (ora “video” or “image” or “picture”) may refer to any of various videocomponents or their combinations. Such components, or theircombinations, include, for example, luminance, chrominance, Y (of YUV orYCbCr or YPbPr or YPcPr), U of (YUV), V (of YUV), Cb (of YCbCr), Cr (ofYCbCr), Pb (of YPbPr), Pr (of YPbPr or YPcPr), Pc (of YPcPr), red (ofRGB), green (or RGB), blue (of RGB), S-Video, and negatives or positivesof any of these components. Further, these different types of componentsmay be used with the described implementations. For example, a YUV setof components may be used with one or more described implementations,and in a typical implementation YUV are combined at the macroblocklevel. Additionally, other picture types may be used with theimplementations and features described herein. Such other picture typesmay include, for example, pictures that include information other than2D video, depth, occlusion or background, transparency or edgediscontinuities.

Additionally, this application or its claims may refer to “determining”various pieces of information. Determining the information may includeone or more of, for example, estimating the information, calculating theinformation, predicting the information, identifying the informationfrom a list or other set of data, or retrieving the information frommemory.

Similarly, “accessing” is intended to be a broad term. Accessing a pieceof information may include any operation that, for example, uses,stores, sends, transmits, receives, retrieves, modifies, parses, orprovides the information.

Many implementations refer to a “reference”. A “reference” may be, forexample, the traditional reference in which a pixel-based differentialfrom a reference is used in predicting a source. A reference may also,or alternatively, be used in different ways to predict a source. Forexample, in one implementation edge location or a measure of edgediscontinuity, is used in predicting the source. In general, anyinformation may be borrowed from the reference to aid in predicting thesource. The examples of information such as pixel values, edgelocations, and a measure of edge discontinuities have been given, butother types of information are possible as well.

The implementations described herein may be implemented in, for example,a method or a process, an apparatus, a software program, a data stream,or a signal. Even if only discussed in the context of a single form ofimplementation (for example, discussed only as a method), theimplementation of features discussed may also be implemented in otherforms (for example, an apparatus or program). An apparatus may beimplemented in, for example, appropriate hardware, software, andfirmware. The methods may be implemented in, for example, an apparatussuch as, for example, a processor, which refers to processing devices ingeneral, including, for example, a computer, a microprocessor, anintegrated circuit, or a programmable logic device. Processors alsoinclude communication devices, such as, for example, computers, cellphones, portable/personal digital assistants (“PDAs”), and other devicesthat facilitate communication of information between end-users.

Implementations of the various processes and features described hereinmay be embodied in a variety of different equipment or applications,particularly, for example, equipment or applications associated withdata encoding and decoding. Examples of such equipment include anencoder, a decoder, a post-processor processing output from a decoder, apre-processor providing input to an encoder, a video coder, a videodecoder, a video codec, a web server, a set-top box, a laptop, apersonal computer, a cell phone, a PDA, and other communication devices.As should be clear, the equipment may be mobile and even installed in amobile vehicle.

Additionally, the methods may be implemented by instructions beingperformed by a processor, and such instructions (and/or data valuesproduced by an implementation) may be stored on a processor-readablemedium such as, for example, an integrated circuit, a software carrieror other storage device such as, for example, a hard disk, a compactdiskette, a random access memory (“RAM”), or a read-only memory (“ROM”).The instructions may form an application program tangibly embodied on aprocessor-readable medium. Instructions may be, for example, inhardware, firmware, software, or a combination. Instructions may befound in, for example, an operating system, a separate application, or acombination of the two. A processor may be characterized, therefore, as,for example, both a device configured to carry out a process and adevice that includes a processor-readable medium (such as a storagedevice) having instructions for carrying out a process. Further, aprocessor-readable medium may store, in addition to or in lieu ofinstructions, data values produced by an implementation.

As will be evident to one of skill in the art, implementations mayproduce a variety of signals formatted to carry information that may be,for example, stored or transmitted. The information may include, forexample, instructions for performing a method, or data produced by oneof the described implementations. For example, a signal may be formattedto carry as data the rules for writing or reading the syntax of adescribed embodiment, or to carry as data the actual syntax-valueswritten by a described embodiment. Such a signal may be formatted, forexample, as an electromagnetic wave (for example, using a radiofrequency portion of spectrum) or as a baseband signal. The formattingmay include, for example, encoding a data stream and modulating acarrier with the encoded data stream. The information that the signalcarries may be, for example, analog or digital information. The signalmay be transmitted over a variety of different wired or wireless links,as is known. The signal may be stored on a processor-readable medium.

It should be appreciated that in the above description ofimplementations various features are sometimes grouped together in asingle implementation, figure, or description for the purpose ofstreamlining the disclosure and aiding in the understanding of one ormore of the various aspects. This method of disclosure, however, is notto be interpreted as reflecting an intention that a claimed inventionrequires more features than are expressly recited in each claim. Rather,as the following claims reflect, inventive aspects may lie in less thanall features of a single foregoing disclosed embodiment. Thus, it isunderstood that each of the claims also provides a separateimplementation.

A number of implementations have been described. Nevertheless, it willbe further understood that various modifications may be made. Forexample, elements of different implementations may be combined,supplemented, modified, or removed to produce other implementations.Further, operations may be interchanged among functional blocks.Additionally, one of ordinary skill will understand that otherstructures and processes may be substituted for those disclosed and theresulting implementations will perform at least substantially the samefunction(s), in at least substantially the same way(s), to achieve atleast substantially the same result(s) as the implementations disclosed.Accordingly, these and other implementations are contemplated by thisapplication and are within the scope of the following claims.

1-26. (canceled)
 27. A method comprising: determining a priority of aninter-layer reference for a picture relative to one or morenon-inter-layer references for the picture, wherein the inter-layerreference is of a different content type than the picture; and includingthe inter-layer reference in an ordered list of references for thepicture based on the priority.
 28. The method of claim 27, furthercomprising determining the inter-layer reference for the picture basedon dependency information for the picture.
 29. The method of claim 27,further comprising using the inter-layer reference in a coding operationinvolving the picture.
 30. The method of claim 29, wherein the codingoperation comprises encoding the picture based, at least in part, on theinter-layer reference.
 31. The method of claim 29, wherein the codingoperation comprises decoding the picture based, at least in part, on theinter-layer reference.
 32. The method of claim 28, wherein thedependency information for the picture is accessed from received syntaxelements.
 33. The method of claim 27, wherein the including furthercomprises including at least one of a temporal reference or aninter-view reference in the ordered list.
 34. The method of claim 30,wherein the encoded picture and one or more other pictures describethree-dimensional (3D) information for a given view at a given time, andthe method further comprises: encoding the one or more other pictures;generating syntax elements that indicate how the encoded picture and theone or more other encoded pictures fit into a structure that supports 3Dprocessing, the structure defining content types for pictures; andgenerating a bitstream that includes the encoded picture, the one ormore other encoded pictures, and the syntax elements, the inclusion ofthe syntax elements providing at a coded-bitstream level indications ofrelationships between the encoded picture and the one or more otherencoded pictures in the structure.
 35. The method of claim 31, whereinthe picture and one or more other pictures describe three-dimensional(3D) information for a given view at a given time, and the methodfurther comprises: accessing an encoding of the picture and encodings ofthe one or more other pictures from a bitstream, the multiple picturesdescribing different three-dimensional (3D) information for a given viewat a given time; accessing syntax elements from the bitstream, thesyntax elements indicating for the picture and the one or more otherpictures how the picture and the one or more other pictures fit into astructure that supports 3D processing, the structure providing a definedrelationship between the picture and the one or more other pictures;decoding the one or more other encoded pictures; and providing thedecoded picture and the one or more other decoded pictures in an outputformat that indicates the defined relationship between the decodedpicture and the one or more other decoded pictures.
 36. The method ofclaim 30, further comprising: generating a bitstream that includes theencoded picture.
 37. The method of claim 27, wherein the picture and theinter-layer reference are different pictures from a set that includes atwo-dimensional (2D) video picture, a depth picture, an occlusion videopicture, an occlusion depth picture, or a transparency picture.
 38. Themethod of claim 27, wherein the picture is an occlusion video pictureand the inter-layer reference is a two-dimensional (2D) video picture.39. The method of claim 38, wherein the including further comprisesincluding the inter-layer reference at the beginning of the ordered listof references for the occlusion video picture.
 40. The method of claim27, wherein the picture is an occlusion depth picture and theinter-layer reference is a depth picture.
 41. The method of claim 40,wherein the including further comprises including the inter-layerreference at the beginning of the ordered list of references for theocclusion depth picture.
 42. The method of claim 27, wherein the pictureis a transparency picture and the inter-layer reference is atwo-dimensional (2D) video picture.
 43. The method of claim 42, whereinthe including further comprises including the inter-layer reference atthe end of the ordered list of references for the transparency picture.44. The method of claim 27, wherein the picture is a depth picture andthe inter-layer reference is a two-dimensional (2D) video picture. 45.The method of claim 44, wherein the including comprises including theinter-layer reference after available temporal references and inter-viewreferences in the ordered list of references for the depth picture. 46.The method of claim 44, wherein the inter-layer reference is included atthe end of the ordered list of references for the depth picture.
 47. Themethod of claim 27, wherein: the picture includes multiple portions thatare encoded separately from each other, and the priority of theinter-layer reference is based on how frequently the inter-layerreference is used as a reference for the portions of the picture. 48.The method of claim 28, wherein the determining of the inter-layerreference further comprises: determining whether the picture is atwo-dimensional (2D) video picture and, if the picture is a 2D videopicture, excluding any depth pictures as inter-layer references in theordered list of references for the picture.
 49. The method of claim 27,wherein determining the priority and including the inter-layer referenceare performed in one or more of an encoder or a decoder.
 50. The methodof claim 27, wherein the inter-layer reference is a reference picturefrom a set of pictures that includes the picture, and the set ofpictures is from a given time and a given view, and individual picturesin the set of pictures describe different three-dimensional informationfor the given time and the given view.
 51. An apparatus comprising:means for determining a priority of an inter-layer reference for apicture relative to one or more non-inter-layer references for thepicture, wherein the inter-layer reference is of a different contenttype than the picture; and means for including the inter-layer referencein an ordered list of references for the picture based on the priority.52. A processor readable medium having stored thereon instructions forcausing a processor to perform at least the following: determining apriority of an inter-layer reference for a picture relative to one ormore non-inter-layer references for the picture, wherein the inter-layerreference is of a different content type than the picture; and includingthe inter-layer reference in an ordered list of references for thepicture based on the priority.
 53. An apparatus, comprising a processorconfigured to perform at least the following: determining a priority ofan inter-layer reference for a picture relative to one or morenon-inter-layer references for the picture, wherein the inter-layerreference is of a different content type than the picture; and includingthe inter-layer reference in an ordered list of references for thepicture based on the priority.
 54. An apparatus comprising: athree-dimensional video (3DV) reference buffer configured to determine apriority of an inter-layer reference for a picture relative to one ormore non-inter-layer references for the picture, wherein the inter-layerreference is of a different content type than the picture, and toinclude the inter-layer reference in an ordered list of references forthe picture based on the priority.
 55. The apparatus of claim 54,further comprising one or more coders configured to use the inter-layerreference in a coding operation involving the picture, and wherein the3DV reference buffer is further configured to determine the inter-layerreference for the picture based on dependency information for thepicture.